Selecting machine-translated data for quick bootstrapping of a natural language understanding system

17Citations
Citations of this article
88Readers
Mendeley users who have this article in their library.

Abstract

This paper investigates the use of Machine Translation (MT) to bootstrap a Natural Language Understanding (NLU) system for a new language for the use case of a large-scale voice-controlled device. The goal is to decrease the cost and time needed to get an annotated corpus for the new language, while still having a large enough coverage of user requests. Different methods of filtering MT data in order to keep utterances that improve NLU performance and language-specific postprocessing methods are investigated. These methods are tested in a large-scale NLU task with translating around 10 millions training utterances from English to German. The results show a large improvement for using MT data over a grammar-based and over an in-house data collection baseline, while reducing the manual effort greatly. Both filtering and post-processing approaches improve results further.

Cite

CITATION STYLE

APA

Gaspers, J., Karanasou, P., & Chatterjee, R. (2018). Selecting machine-translated data for quick bootstrapping of a natural language understanding system. In NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference (Vol. 3, pp. 137–144). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/n18-3017

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free