Cross-Domain Deep Code Search with Meta Learning

25Citations
Citations of this article
29Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recently, pre-trained programming language models such as Code-BERT have demonstrated substantial gains in code search. Despite their success, they rely on the availability of large amounts of parallel data to fine-tune the semantic mappings between queries and code. This restricts their practicality in domain-specific languages with relatively scarce and expensive data. In this paper, we propose CDCS, a novel approach for domain-specific code search. CDCS employs a transfer learning framework where an initial program representation model is pre-trained on a large corpus of common programming languages (such as Java and Python), and is further adapted to domain-specific languages such as Solidity and SQL. Un-like cross-language CodeBERT, which is directly fine-tuned in the target language, CDCS adapts a few-shot meta-learning algorithm called MAML to learn the good initialization of model parameters, which can be best reused in a domain-specific language. We evaluate the proposed approach on two domain-specific languages, namely Solidity and SQL, with model transferred from two widely used languages (Python and Java). Experimental results show that CDCS significantly outperforms conventional pre-trained code models that are directly fine-tuned in domain-specific languages, and it is particularly effective for scarce data.

Cite

CITATION STYLE

APA

Chai, Y., Zhang, H., Shen, B., & Gu, X. (2022). Cross-Domain Deep Code Search with Meta Learning. In Proceedings - International Conference on Software Engineering (Vol. 2022-May, pp. 487–498). IEEE Computer Society. https://doi.org/10.1145/3510003.3510125

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free