Incorporating external knowledge through pre-training for natural language to code generation

50Citations
Citations of this article
197Readers
Mendeley users who have this article in their library.

Abstract

Open-domain code generation aims to generate code in a general-purpose programming language (such as Python) from natural language (NL) intents. Motivated by the intuition that developers usually retrieve resources on the web when writing code, we explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation. Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa. The code and resources are available at https://github.com/neulab/external-knowledge-codegen.

Cite

CITATION STYLE

APA

Xu, F. F., Jiang, Z., Yin, P., Vasilescu, B., & Neubig, G. (2020). Incorporating external knowledge through pre-training for natural language to code generation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 6045–6052). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.acl-main.538

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free