A source code similarity based on Siamese neural network

Chunli Xie; Xia Wang; Cheng Qian; Mengqi Wang

Journal ArticleOPEN ACCESS

A source code similarity based on Siamese neural network

Applied Sciences (Switzerland) (2020) 10(21) 1-12

DOI: 10.3390/app10217519

15Citations

23Readers

Abstract

Finding similar code snippets is a fundamental task in the field of software engineering. Several approaches have been proposed for this task by using statistical language model which focuses on syntax and structure of codes rather than deep semantic information underlying codes. In this paper, a Siamese Neural Network is proposed that maps codes into continuous space vectors and try to capture their semantic meaning. Firstly, an unsupervised pre-trained method that models code snippets as a weighted series of word vectors. The weights of the series are fitted by the Term Frequency-Inverse Document Frequency (TF-IDF). Then, a Siamese Neural Network trained model is constructed to learn semantic vector representation of code snippets. Finally, the cosine similarity is provided to measure the similarity score between pairs of code snippets. Moreover, we have implemented our approach on a dataset of functionally similar code. The experimental results show that our method improves some performance over single word embedding method.

Author supplied keywords

Cite

CITATION STYLE

APA

Xie, C., Wang, X., Qian, C., & Wang, M. (2020). A source code similarity based on Siamese neural network. Applied Sciences (Switzerland), 10(21), 1–12. https://doi.org/10.3390/app10217519

A source code similarity based on Siamese neural network

Abstract

Author supplied keywords

Cite

Register to see more suggestions