Reducing the impact of data sparsity in statistical machine translation

3Citations
Citations of this article
72Readers
Mendeley users who have this article in their library.

Abstract

Morphologically rich languages generally require large amounts of parallel data to adequately estimate parameters in a statistical Machine Translation(SMT) system. However, it is time consuming and expensive to create large collections of parallel data. In this paper, we explore two strategies for circumventing sparsity caused by lack of large parallel corpora. First, we explore the use of distributed representations in an Recurrent Neural Network based language model with different morphological features and second, we explore the use of lexical resources such as WordNet to overcome sparsity of content words.

Cite

CITATION STYLE

APA

Singla, K., Sachdeva, K., Yadav, D., Bangalore, S., & Sharma, D. M. (2014). Reducing the impact of data sparsity in statistical machine translation. In Proceedings of SSST 2014 - 8th Workshop on Syntax, Semantics and Structure in Statistical Translation (pp. 51–56). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-4006

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free