Early Guessing for Dialect Identification

4Citations
Citations of this article
23Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper deals with the problem of incremental dialect identification. Our goal is to reliably determine the dialect before the full utterance is given as input. The major part of the previous research on dialect identification has been model-centric, focusing on performance. We address a new question: How much input is needed to identify a dialect? Our approach is a data-centric analysis that results in general criteria for finding the shortest input needed to make a plausible guess. Working with three sets of language dialects (Swiss German, Indo-Aryan and Arabic languages), we show that it is possible to generalize across dialects and datasets with two input shortening criteria: model confidence and minimal input length (adjusted for the input type). The source code for experimental analysis can be found at Github.

Cite

CITATION STYLE

APA

Kanjirangat, V., Samardzic, T., Rinaldi, F., & Dolamic, L. (2022). Early Guessing for Dialect Identification. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 6446–6455). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-emnlp.276

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free