Learning to find naming issues with big code and small supervision

Jingxuan He; Cheng Chun Lee; Veselin Raychev; Martin Vechev

Conference ProceedingsOPEN ACCESS

Learning to find naming issues with big code and small supervision

Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (2021) 296-311

DOI: 10.1145/3453483.3454045

7Citations

30Readers

Get full text

Abstract

We introduce a new approach for finding and fixing naming issues in source code. The method is based on a careful combination of unsupervised and supervised procedures: (i) unsupervised mining of patterns from Big Code that express common naming idioms. Program fragments violating such idioms indicates likely naming issues, and (ii) supervised learning of a classifier on a small labeled dataset which filters potential false positives from the violations. We implemented our method in a system called Namer and evaluated it on a large number of Python and Java programs. We demonstrate that Namer is effective in finding naming mistakes in real world repositories with high precision (∼70%). Perhaps surprisingly, we also show that existing deep learning methods are not practically effective and achieve low precision in finding naming issues (up to ∼16%).

Author supplied keywords

Cite

CITATION STYLE

APA

He, J., Lee, C. C., Raychev, V., & Vechev, M. (2021). Learning to find naming issues with big code and small supervision. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (pp. 296–311). Association for Computing Machinery. https://doi.org/10.1145/3453483.3454045

Learning to find naming issues with big code and small supervision

Abstract

Author supplied keywords

Cite

Register to see more suggestions