Learning to find naming issues with big code and small supervision

7Citations
Citations of this article
30Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We introduce a new approach for finding and fixing naming issues in source code. The method is based on a careful combination of unsupervised and supervised procedures: (i) unsupervised mining of patterns from Big Code that express common naming idioms. Program fragments violating such idioms indicates likely naming issues, and (ii) supervised learning of a classifier on a small labeled dataset which filters potential false positives from the violations. We implemented our method in a system called Namer and evaluated it on a large number of Python and Java programs. We demonstrate that Namer is effective in finding naming mistakes in real world repositories with high precision (∼70%). Perhaps surprisingly, we also show that existing deep learning methods are not practically effective and achieve low precision in finding naming issues (up to ∼16%).

Cite

CITATION STYLE

APA

He, J., Lee, C. C., Raychev, V., & Vechev, M. (2021). Learning to find naming issues with big code and small supervision. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (pp. 296–311). Association for Computing Machinery. https://doi.org/10.1145/3453483.3454045

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free