Morphology-Aware Meta-Embeddings for Tamil

3Citations
Citations of this article
50Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this work, we explore generating morphologically enhanced word embeddings for Tamil, a highly agglutinative South Indian language with rich morphology that remains low-resource with regards to NLP tasks. We present here the first-ever word analogy dataset for Tamil, consisting of 4499 hand-curated word tetrads across 10 semantic and 13 morphological relation types. Using a rules-based morphological segmenter and meta-embedding techniques, we train meta-embeddings that outperform existing baselines by 16% on our analogy task and appear to mitigate a previously observed trade-off between semantic and morphological accuracy.

Cite

CITATION STYLE

APA

Krishnan, A., & Ragavan, S. (2021). Morphology-Aware Meta-Embeddings for Tamil. In NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop (pp. 94–111). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.naacl-srw.13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free