Using LSTMs to model the java programming language

Brendon Boldt

Conference Proceedings

Using LSTMs to model the java programming language

Boldt B

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10614 LNCS 268-275

DOI: 10.1007/978-3-319-68612-7_31

3Citations

7Readers

Get full text

Abstract

Recurrent neural networks (RNNs), specifically long-short term memory networks (LSTMs), can model natural language effectively. This research investigates the ability for these same LSTMs to perform next “word” prediction on the Java programming language. Java source code from four different repositories undergoes a transformation that preserves the logical structure of the source code and removes the code’s various specificities such as variable names and literal values. Such datasets and an additional English language corpus are used to train and test standard LSTMs’ ability to predict the next element in a sequence. Results suggest that LSTMs can effectively model Java code achieving perplexities under 22 and accuracies above 0.47, which is an improvement over LSTM’s performance on the English language which demonstrated a perplexity of 85 and an accuracy of 0.27. This research can have applicability in other areas such as syntactic template suggestion and automated bug patching.

Cite

CITATION STYLE

APA

Boldt, B. (2017). Using LSTMs to model the java programming language. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10614 LNCS, pp. 268–275). Springer Verlag. https://doi.org/10.1007/978-3-319-68612-7_31

Using LSTMs to model the java programming language

Abstract

Cite

Register to see more suggestions