Abstract
Lexical sparsity is a major challenge for machine translation into morphologically rich lan-guages. We address this problem by modeling sequences of fine-grained morphological tags in a bilingual context. To overcome the issue of ambiguous word analyses, we introduce soft tags, which are under-specified representations retaining all possible morphological attributes of a word. In order to learn distributed representations for the soft tags and their interactions we adopt a neural network approach. This approach allows for the combination of source and target side information to model a wide range of inflection phenomena. Our re-inflection ex-periments show a substantial increase in accuracy compared to a model trained on morpholog-ically disambiguated data. Integrated into an SMT decoder and evaluated for English-Italian and English-Russian translation, our model yields improvements of up to 1.0 BLEU over a competitive baseline.
Cite
CITATION STYLE
Tran, K., Bisazza, A., & Monz, C. (2015). A Distributed Inflection Model for Translating into Morphologically Rich Languages. MT-Summit-2015, 1, 145–159.
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.