Morphology-Aware Meta-Embeddings for Tamil

Arjun Krishnan; Seyoon Ragavan

Conference Proceedings

Morphology-Aware Meta-Embeddings for Tamil

NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop (2021) 94-111

DOI: 10.18653/v1/2021.naacl-srw.13

3Citations

50Readers

Get full text

Abstract

In this work, we explore generating morphologically enhanced word embeddings for Tamil, a highly agglutinative South Indian language with rich morphology that remains low-resource with regards to NLP tasks. We present here the first-ever word analogy dataset for Tamil, consisting of 4499 hand-curated word tetrads across 10 semantic and 13 morphological relation types. Using a rules-based morphological segmenter and meta-embedding techniques, we train meta-embeddings that outperform existing baselines by 16% on our analogy task and appear to mitigate a previously observed trade-off between semantic and morphological accuracy.

Cite

CITATION STYLE

APA

Krishnan, A., & Ragavan, S. (2021). Morphology-Aware Meta-Embeddings for Tamil. In NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop (pp. 94–111). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.naacl-srw.13

Morphology-Aware Meta-Embeddings for Tamil

Abstract

Cite

Register to see more suggestions