CodeMatcher: a tool for large-scale code search based on query semantics matching

3Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Due to the emergence of large-scale codebases, such as GitHub and Gitee, searching and reusing existing code can help developers substantially improve software development productivity. Over the years, many code search tools have been developed. Early tools leveraged the information retrieval (IR) technique to perform an efficient code search for a frequently changed large-scale codebase. However, the search accuracy was low due to the semantic mismatch between query and code. In the recent years, many tools leveraged Deep Learning (DL) technique to address this issue. But the DL-based tools are slow and the search accuracy is unstable. In this paper, we presented an IR-based tool CodeMatcher, which inherits the advantages of the DL-based tool in query semantics matching. Generally, CodeMatcher builds indexing for a large-scale codebase at first to accelerate the search response time. For a given search query, it addresses irrelevant and noisy words in the query, then retrieves candidate code from the indexed codebase via iterative fuzzy search, and finally reranks the candidates based on two designed measures of semantic matching between query and candidates. We implemented CodeMatcher as a search engine website. To verify the effectiveness of our tool, we evaluated CodeMatcher on 41k+ open-source Java repositories. Experimental results showed that CodeMatcher can achieve an industrial-level response time (0.3s) with a common server with an Intel-i7 CPU. On the search accuracy, CodeMatcher significantly outperforms three state-of-the-art tools (DeepCS, UNIF, and CodeHow) and two online search engines (GitHub search and Google search).

References Powered by Scopus

A fast and accurate dependency parser using neural networks

1446Citations
N/AReaders
Get full text

Deep code search

485Citations
N/AReaders
Get full text

Example-centric programming: Integrating web search into the development environment

262Citations
N/AReaders
Get full text

Cited by Powered by Scopus

SECON: Maintaining Semantic Consistency in Data Augmentation for Code Search

2Citations
N/AReaders
Get full text

VisRepo: A Visual Retrieval Tool for Large-Scale Open-Source Projects

0Citations
N/AReaders
Get full text

An Empirical Study of Code Search in Intelligent Coding Assistant: Perceptions, Expectations, and Directions

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Liu, C., Bao, X., Xia, X., Yan, M., Lo, D., & Zhang, T. (2022). CodeMatcher: a tool for large-scale code search based on query semantics matching. In ESEC/FSE 2022 - Proceedings of the 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 1642–1646). Association for Computing Machinery, Inc. https://doi.org/10.1145/3540250.3558935

Readers over time

‘22‘23‘24‘2502468

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 2

50%

Professor / Associate Prof. 1

25%

Researcher 1

25%

Readers' Discipline

Tooltip

Computer Science 3

75%

Engineering 1

25%

Save time finding and organizing research with Mendeley

Sign up for free
0