Early Guessing for Dialect Identification

Vani Kanjirangat; Tanja Samardzic; Fabio Rinaldi; Ljiljana Dolamic

Conference Proceedings

Early Guessing for Dialect Identification

Findings of the Association for Computational Linguistics: EMNLP 2022 (2022) 6446-6455

DOI: 10.18653/v1/2022.findings-emnlp.276

4Citations

23Readers

Get full text

Abstract

This paper deals with the problem of incremental dialect identification. Our goal is to reliably determine the dialect before the full utterance is given as input. The major part of the previous research on dialect identification has been model-centric, focusing on performance. We address a new question: How much input is needed to identify a dialect? Our approach is a data-centric analysis that results in general criteria for finding the shortest input needed to make a plausible guess. Working with three sets of language dialects (Swiss German, Indo-Aryan and Arabic languages), we show that it is possible to generalize across dialects and datasets with two input shortening criteria: model confidence and minimal input length (adjusted for the input type). The source code for experimental analysis can be found at Github.

Cite

CITATION STYLE

APA

Kanjirangat, V., Samardzic, T., Rinaldi, F., & Dolamic, L. (2022). Early Guessing for Dialect Identification. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 6446–6455). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-emnlp.276

Early Guessing for Dialect Identification

Abstract

Cite

Register to see more suggestions