Data selectionwith fewerwords

Amittai Axelrod; Xiaodong He; Philip Resnik; Mari Ostendorf

Conference ProceedingsOPEN ACCESS

Data selectionwith fewerwords

10th Workshop on Statistical Machine Translation, WMT 2015 at the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Proceedings (2015) 58-65

DOI: 10.18653/v1/w15-3003

7Citations

10Readers

Abstract

We present a method that improves data selection by combining a hybrid word/part-of-speech representation for corpora, with the idea of distinguishing between rare and frequent events. We validate our approach using data selection for machine translation, and show that it maintains or improves BLEU and TER translation scores while substantially improving vocabulary coverage and reducing data selection model size. Paradoxically, the coverage improvement is achieved by abstracting away over 97% of the total training corpus vocabulary using simple part-of-speech tags during the data selection process.

Cite

CITATION STYLE

APA

Axelrod, A., He, X., Resnik, P., & Ostendorf, M. (2015). Data selectionwith fewerwords. In 10th Workshop on Statistical Machine Translation, WMT 2015 at the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Proceedings (pp. 58–65). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w15-3003

Data selectionwith fewerwords

Abstract

Cite

Register to see more suggestions