Neural sequence-to-sequence learning of internal word structure

Tatyana Ruzsics; Tanja Samardžić

Conference Proceedings

Neural sequence-to-sequence learning of internal word structure

CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings (2017) 184-194

DOI: 10.18653/v1/k17-1020

19Citations

76Readers

Get full text

Abstract

Learning internal word structure has recently been recognized as an important step in various multilingual processing tasks and in theoretical language comparison. In this paper, we present a neural encoder-decoder model for learning canonical morphological segmentation. Our model combines character-level sequence-to-sequence transformation with a language model over canonical segments. We obtain up to 4% improvement over a strong character-level encoder-decoder baseline for three languages. Our model outperforms the previous state-of-the-art for two languages, while eliminating the need for external resources such as large dictionaries. Finally, by comparing the performance of encoder-decoder and classical statistical machine translation systems trained with and without corpus counts, we show that including corpus counts is beneficial to both approaches.

Cite

CITATION STYLE

APA

Ruzsics, T., & Samardžić, T. (2017). Neural sequence-to-sequence learning of internal word structure. In CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings (pp. 184–194). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/k17-1020

Neural sequence-to-sequence learning of internal word structure

Abstract

Cite

Register to see more suggestions