Improve language modelling for code completion through learning general token repetition of source code

6Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

In last few years, to solve the problem of code completion, using a language model such as LSTM to learn code token sequences is the state-of-art method. However, tokens in source code are more repetitive than words in natural languages. For example, once a variable is declared in a program, it may be used many times. Other elements such as generic types in templates also occur repeatedly. It is important to capture token repetition of code. For example, if usage patterns of variables are not captured, there is little chance for a model trained on one project to predict the name of an unseen variable in another project correctly. Capturing token repetition of source code is challenging because not only the repeated token but also the place at where the repetition should happen must be both decided at the same time. Hence, we propose a novel deep neural model named REP to capture the general token repetition of source code. The repetitions of code tokens are modeled as edges connecting between repeated tokens on a graph. The REP model is essentially a deep neural graph generation model. The experiments indicate that the proposed model outperforms state-of-arts in code completion.

Cite

CITATION STYLE

APA

Yang, Y., Chen, X., & Sun, J. (2019). Improve language modelling for code completion through learning general token repetition of source code. In Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE (Vol. 2019-July, pp. 667–674). Knowledge Systems Institute Graduate School. https://doi.org/10.18293/SEKE2019-056

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free