Duplicate code adversely affects the quality of software systems and hence should be detected. We discuss an approach that improves source code retrieval using structural information of source code. A lexical parser is developed to extract control statements and method identifiers from Java programs. We propose a similarity measure that is defined by the ratio of the number of sequential fully matching statements to the number of sequential partially matching statements. The defined similarity measure is an extension of the set-based Sorensen-Dice similarity index. This research primarily contributes to the development of a similarity retrieval algorithm that derives meaningful search conditions from a given sequence, and then performs retrieval using all derived conditions. Experiments show that our retrieval model shows an improvement of up to 90.9% over other retrieval models relative to the number of retrieved methods.
CITATION STYLE
Udagawa, Y. (2013). Source Code Retrieval Using Sequence Based Similarity. International Journal of Data Mining & Knowledge Management Process, 3(4), 57–74. https://doi.org/10.5121/ijdkp.2013.3404
Mendeley helps you to discover research relevant for your work.