Data selectionwith fewerwords

7Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

We present a method that improves data selection by combining a hybrid word/part-of-speech representation for corpora, with the idea of distinguishing between rare and frequent events. We validate our approach using data selection for machine translation, and show that it maintains or improves BLEU and TER translation scores while substantially improving vocabulary coverage and reducing data selection model size. Paradoxically, the coverage improvement is achieved by abstracting away over 97% of the total training corpus vocabulary using simple part-of-speech tags during the data selection process.

Cite

CITATION STYLE

APA

Axelrod, A., He, X., Resnik, P., & Ostendorf, M. (2015). Data selectionwith fewerwords. In 10th Workshop on Statistical Machine Translation, WMT 2015 at the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Proceedings (pp. 58–65). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w15-3003

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free