Abstract
Learning internal word structure has recently been recognized as an important step in various multilingual processing tasks and in theoretical language comparison. In this paper, we present a neural encoder-decoder model for learning canonical morphological segmentation. Our model combines character-level sequence-to-sequence transformation with a language model over canonical segments. We obtain up to 4% improvement over a strong character-level encoder-decoder baseline for three languages. Our model outperforms the previous state-of-the-art for two languages, while eliminating the need for external resources such as large dictionaries. Finally, by comparing the performance of encoder-decoder and classical statistical machine translation systems trained with and without corpus counts, we show that including corpus counts is beneficial to both approaches.
Cite
CITATION STYLE
Ruzsics, T., & Samardžić, T. (2017). Neural sequence-to-sequence learning of internal word structure. In CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings (pp. 184–194). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/k17-1020
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.