Source Code Retrieval Using Sequence Based Similarity

  • Udagawa Y
N/ACitations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

Duplicate code adversely affects the quality of software systems and hence should be detected. We discuss an approach that improves source code retrieval using structural information of source code. A lexical parser is developed to extract control statements and method identifiers from Java programs. We propose a similarity measure that is defined by the ratio of the number of sequential fully matching statements to the number of sequential partially matching statements. The defined similarity measure is an extension of the set-based Sorensen-Dice similarity index. This research primarily contributes to the development of a similarity retrieval algorithm that derives meaningful search conditions from a given sequence, and then performs retrieval using all derived conditions. Experiments show that our retrieval model shows an improvement of up to 90.9% over other retrieval models relative to the number of retrieved methods.

Cite

CITATION STYLE

APA

Udagawa, Y. (2013). Source Code Retrieval Using Sequence Based Similarity. International Journal of Data Mining & Knowledge Management Process, 3(4), 57–74. https://doi.org/10.5121/ijdkp.2013.3404

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free