We introduce a new approach for finding and fixing naming issues in source code. The method is based on a careful combination of unsupervised and supervised procedures: (i) unsupervised mining of patterns from Big Code that express common naming idioms. Program fragments violating such idioms indicates likely naming issues, and (ii) supervised learning of a classifier on a small labeled dataset which filters potential false positives from the violations. We implemented our method in a system called Namer and evaluated it on a large number of Python and Java programs. We demonstrate that Namer is effective in finding naming mistakes in real world repositories with high precision (∼70%). Perhaps surprisingly, we also show that existing deep learning methods are not practically effective and achieve low precision in finding naming issues (up to ∼16%).
CITATION STYLE
He, J., Lee, C. C., Raychev, V., & Vechev, M. (2021). Learning to find naming issues with big code and small supervision. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (pp. 296–311). Association for Computing Machinery. https://doi.org/10.1145/3453483.3454045
Mendeley helps you to discover research relevant for your work.