Abstract
Recent advances in pretrained multilingual models such as Multilingual T5 (mT5) have facilitated cross-lingual transfer by learning shared representations across languages. Leveraging pre-trained multilingual models for scaling morphology analyzers to low-resource languages is a unique opportunity that has been under-explored so far. We investigate this line of research in the context of Indian languages, focusing on two important morphological subtasks: root word extraction and tagging morphosyntactic descriptions (MSD), viz., gender, number, and person (GNP). We experiment with six Indian languages from two language families (Dravidian and Indo-Aryan) to train a multilingual morphology analyzers for the first time for Indian languages. We demonstrate the usability of multilingual models for fewshot cross-lingual transfer through an average 7% increase in GNP tagging in a cross-lingual setting as compared to a monolingual setting through controlled experiments. We provide an overview of the state of the datasets available related to our tasks and point-out a few modeling limitations due to datasets. Lastly, we analyze the cross-lingual transfer of morphological tags for verbs and nouns, which provides a proxy for the quality of representations of word markings learned by the model.
Cite
CITATION STYLE
Pawar, S., Talukdar, P., & Bhattacharyya, P. (2023). Evaluating Cross Lingual Transfer for Morphological Analysis: A Case Study of Indian Languages. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 14–26). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.sigmorphon-1.3
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.