Abstract
While Indic NLP has made rapid advances recently in terms of the availability of corpora and pre-trained models, benchmark datasets on standard NLU tasks are limited. To this end, we introduce INDICXNLI, an NLI dataset for 11 Indic languages. It has been created by high-quality machine translation of the original English XNLI dataset and our analysis attests to the quality of INDICXNLI. By finetuning different pre-trained LMs on this INDICXNLI, we analyze various cross-lingual transfer techniques with respect to the impact of the choice of language models, languages, multi-linguality, mix-language input, etc. These experiments provide us with useful insights into the behaviour of pre-trained models for a diverse set of languages.
Cite
CITATION STYLE
Aggarwal, D., Gupta, V., & Kunchukuttan, A. (2022). INDICXNLI: Evaluating Multilingual Inference for Indian Languages. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (pp. 10994–11006). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-main.755
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.