CCFinder: A multilinguistic token-based code clone detection system for large scale source code

Toshihiro Kamiya; Shinji Kusumoto; Katsuro Inoue

Journal Article

CCFinder: A multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering (2002) 28(7) 654-670

DOI: 10.1109/TSE.2002.1019480

1.3kCitations

382Readers

Get full text

Abstract

A code clone is a code portion is source files that is identical or similar to another. Since code clones are believed to reduce the maintainability of software, several code clone detection techniques and tools have been proposed. This paper proposes a new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison. For its implementation with several useful optimization techniques, we have developed a tool, named CCFinder, which extracts code clones in C, C++, Java, COBOL, and other source files. As well, metrics for the code clones have been developed. In order to evaluate the usefulness of CCFinder and metrics, we conducted several case studies where we applied the new tool to the source code of JDK, FreeBSD, NetBSD, Linux, and many other systems. As a result, CCFinder has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems. In addition, we have compared the proposed technique with other clone detection techniques.

Author supplied keywords

Cite

CITATION STYLE

APA

Kamiya, T., Kusumoto, S., & Inoue, K. (2002). CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering, 28(7), 654–670. https://doi.org/10.1109/TSE.2002.1019480

CCFinder: A multilinguistic token-based code clone detection system for large scale source code

Abstract

Author supplied keywords

Cite

Register to see more suggestions