Characters or Morphemes: How to Represent Words?

23Citations
Citations of this article
116Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we investigate the effects of using subword information in representation learning. We argue that using syntactic subword units effects the quality of the word representations positively. We introduce a morpheme-based model and compare it against to word-based, character-based, and character n-gram level models. Our model takes a list of candidate segmentations of a word and learns the representation of the word based on different segmentations that are weighted by an attention mechanism. We performed experiments on Turkish as a morphologically rich language and English with a comparably poorer morphology. The results show that morpheme-based models are better at learning word representations of morphologically complex languages compared to character-based and character n-gram level models since the morphemes help to incorporate more syntactic knowledge in learning, that makes morpheme-based models better at syntactic tasks.

Cite

CITATION STYLE

APA

Üstün, A., Kurfalı, M., & Can, B. (2018). Characters or Morphemes: How to Represent Words? In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 144–153). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w18-3019

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free