Statistical machine translation outperforms neural machine translation in software engineering: Why and how

7Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.

Abstract

Neural Machine Translation (NMT) is the current trend approach in Natural Language Processing (NLP) to solve the problem of auto-matically inferring the content of target language given the source language. The ability of NMT is to learn deep knowledge inside lan-guages by deep learning approaches. However, prior works show that NMT has its own drawbacks in NLP and in some research problems of Software Engineering (SE). In this work, we provide a hypothesis that SE corpus has inherent characteristics that NMT will confront challenges compared to the state-of-The-Art translation engine based on Statistical Machine Translation. We introduce a problem which is significant in SE and has characteristics that challenges the abil-ity of NMT to learn correct sequences, called Prefix Mapping. We implement and optimize the original SMT and NMT to mitigate those challenges. By the evaluation, we show that SMT outperforms NMT for this research problem, which provides potential directions to optimize the current NMT engines for specific classes of parallel corpus. By achieving the accuracy from 65% to 90% for code tokens generation of 1000 Github code corpus, we show the potential of using MT for code completion at token level.

Cite

CITATION STYLE

APA

Phan, H., & Jannesari, A. (2020). Statistical machine translation outperforms neural machine translation in software engineering: Why and how. In RL+SE and PL 2020 - Proceedings of the 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Languages, Co-located with ESEC/FSE 2020 (pp. 3–12). Association for Computing Machinery, Inc. https://doi.org/10.1145/3416506.3423576

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free