Neural sequence-to-sequence learning of internal word structure

19Citations
Citations of this article
76Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Learning internal word structure has recently been recognized as an important step in various multilingual processing tasks and in theoretical language comparison. In this paper, we present a neural encoder-decoder model for learning canonical morphological segmentation. Our model combines character-level sequence-to-sequence transformation with a language model over canonical segments. We obtain up to 4% improvement over a strong character-level encoder-decoder baseline for three languages. Our model outperforms the previous state-of-the-art for two languages, while eliminating the need for external resources such as large dictionaries. Finally, by comparing the performance of encoder-decoder and classical statistical machine translation systems trained with and without corpus counts, we show that including corpus counts is beneficial to both approaches.

Cite

CITATION STYLE

APA

Ruzsics, T., & Samardžić, T. (2017). Neural sequence-to-sequence learning of internal word structure. In CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings (pp. 184–194). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/k17-1020

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free