Improve language modelling for code completion through learning general token repetition of source code

Yixiao Yang; Xiang Chen; Jiaguang Sun

Conference ProceedingsOPEN ACCESS

Improve language modelling for code completion through learning general token repetition of source code

Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE (2019) 2019-July 667-674

DOI: 10.18293/SEKE2019-056

7Citations

6Readers

Abstract

In last few years, to solve the problem of code completion, using a language model such as LSTM to learn code token sequences is the state-of-art method. However, tokens in source code are more repetitive than words in natural languages. For example, once a variable is declared in a program, it may be used many times. Other elements such as generic types in templates also occur repeatedly. It is important to capture token repetition of code. For example, if usage patterns of variables are not captured, there is little chance for a model trained on one project to predict the name of an unseen variable in another project correctly. Capturing token repetition of source code is challenging because not only the repeated token but also the place at where the repetition should happen must be both decided at the same time. Hence, we propose a novel deep neural model named REP to capture the general token repetition of source code. The repetitions of code tokens are modeled as edges connecting between repeated tokens on a graph. The REP model is essentially a deep neural graph generation model. The experiments indicate that the proposed model outperforms state-of-arts in code completion.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Yang, Y., Chen, X., & Sun, J. (2019). Improve language modelling for code completion through learning general token repetition of source code. In Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE (Vol. 2019-July, pp. 667–674). Knowledge Systems Institute Graduate School. https://doi.org/10.18293/SEKE2019-056

Readers' Seniority

PhD / Post grad / Masters / Doc 2

100%

Readers' Discipline

Computer Science 3

100%

Improve language modelling for code completion through learning general token repetition of source code

Abstract

Author supplied keywords

References Powered by Scopus

On the naturalness of software

Deep API Learning

Toward deep learning software repositories

Cited by Powered by Scopus

Code prediction by feeding trees to transformers

“More Than Deep Learning”: post-processing for API sequence recommendation

Toward Less Hidden Cost of Code Completion with Acceptance and Ranking Models

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline