Source Code Retrieval Using Sequence Based Similarity

Yoshihisa Udagawa

Journal ArticleOPEN ACCESS

Source Code Retrieval Using Sequence Based Similarity

Udagawa Y

International Journal of Data Mining & Knowledge Management Process (2013) 3(4) 57-74

DOI: 10.5121/ijdkp.2013.3404

N/ACitations

6Readers

Abstract

Duplicate code adversely affects the quality of software systems and hence should be detected. We discuss an approach that improves source code retrieval using structural information of source code. A lexical parser is developed to extract control statements and method identifiers from Java programs. We propose a similarity measure that is defined by the ratio of the number of sequential fully matching statements to the number of sequential partially matching statements. The defined similarity measure is an extension of the set-based Sorensen-Dice similarity index. This research primarily contributes to the development of a similarity retrieval algorithm that derives meaningful search conditions from a given sequence, and then performs retrieval using all derived conditions. Experiments show that our retrieval model shows an improvement of up to 90.9% over other retrieval models relative to the number of retrieved methods.

Cite

CITATION STYLE

APA

Udagawa, Y. (2013). Source Code Retrieval Using Sequence Based Similarity. International Journal of Data Mining & Knowledge Management Process, 3(4), 57–74. https://doi.org/10.5121/ijdkp.2013.3404

Source Code Retrieval Using Sequence Based Similarity

Abstract

Cite

Register to see more suggestions