Abstract
In this paper, we describe our English-Hindi and Hindi-English statistical systems submitted to theWMT14 shared task. The core components of our translation systems are phrase based (Hindi-English) and factored (English-Hindi) SMT systems. We show that the use of number, case and Tree Adjoining Grammar information as factors helps to improve English-Hindi translation, primarily by generating morphological inflections correctly. We show improvements to the translation systems using pre-procesing and post-processing components. To overcome the structural divergence between English and Hindi, we preorder the source side sentence to conform to the target language word order. Since parallel corpus is limited, many words are not translated. We translate out-of-vocabulary words and transliterate named entities in a post-processing stage. We also investigate ranking of translations from multiple systems to select the best translation.
Cite
CITATION STYLE
Dungarwal, P., Chatterjee, R., Mishra, A., Kunchukuttan, A., Shah, R., & Bhattacharyya, P. (2014). The IIT bombay hindi,english translation system at WMT 2014. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 90–96). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-3308
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.