Using related languages to enhance statistical language models

Anna Currey; Alina Karakanta

Conference ProceedingsOPEN ACCESS

Using related languages to enhance statistical language models

HLT-NAACL 2016 - 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop (2016) 116-123

DOI: 10.18653/v1/n16-2017

6Citations

72Readers

Abstract

The success of many language modeling methods and applications relies heavily on the amount of data available. This problem is further exacerbated in statistical machine translation, where parallel data in the source and target languages is required. However, large amounts of data are only available for a small number of languages; as a result, many language modeling techniques are inadequate for the vast majority of languages. In this paper, we attempt to lessen the problem of a lack of training data for low-resource languages by adding data from related high-resource languages in three experiments. First, we interpolate language models trained on the target language and on the related language. In our second experiment, we select the sentences most similar to the target language and add them to our training corpus. Finally, we integrate data from the related language into a translation model for a statistical machine translation application. Although we do not see many significant improvements over baselines trained on a small amount of data in the target language, we discuss some further experiments that could be attempted in order to augment language models and translation models with data from related languages.

Cite

CITATION STYLE

APA

Currey, A., & Karakanta, A. (2016). Using related languages to enhance statistical language models. In HLT-NAACL 2016 - 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop (pp. 116–123). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/n16-2017

Using related languages to enhance statistical language models

Abstract

Cite

Register to see more suggestions