Disentangled Code Representation Learning for Multiple Programming Languages

Jingfeng Zhang; Haiwen Hong; Yin Zhang; Yao Wan; Ye Liu; Yulei Sui

Conference ProceedingsOPEN ACCESS

Disentangled Code Representation Learning for Multiple Programming Languages

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (2021) 4454-4466

DOI: 10.18653/v1/2021.findings-acl.391

11Citations

57Readers

Abstract

Developing effective distributed representations of source code is fundamental yet challenging for many software engineering tasks such as code clone detection, code search, code translation and transformation. However, current code embedding approaches that represent the semantic and syntax of code in a mixed way are less interpretable and the resulting embedding can not be easily generalized across programming languages. In this paper, we propose a disentangled code representation learning approach to separate the semantic from the syntax of source code under a multi-programming-language setting, obtaining better interpretability and generalizability. Specially, we design three losses dedicated to the characteristics of source code to enforce the disentanglement effectively. We conduct comprehensive experiments on a real-world dataset composed of programming exercises implemented by multiple solutions that are semantically identical but grammatically distinguished. The experimental results validate the superiority of our proposed disentangled code representation, compared to several baselines, across three types of downstream tasks, i.e., code clone detection, code translation, and code-to-code search.

Cite

CITATION STYLE

APA

Zhang, J., Hong, H., Zhang, Y., Wan, Y., Liu, Y., & Sui, Y. (2021). Disentangled Code Representation Learning for Multiple Programming Languages. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 4454–4466). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.391

Disentangled Code Representation Learning for Multiple Programming Languages

Abstract

Cite

Register to see more suggestions