The interplay between language similarity and script on a novel multi-layer Algerian dialect corpus

12Citations
Citations of this article
55Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recent years have seen a rise in interest for cross-lingual transfer between languages with similar typology, and between languages of various scripts. However, the interplay between language similarity and difference in script on cross-lingual transfer is a less studied problem. We explore this interplay on cross-lingual transfer for two supervised tasks, namely part-of-speech tagging and sentiment analysis. We introduce a newly annotated corpus of Algerian user-generated comments comprising parallel annotations of Algerian written in Latin, Arabic, and code-switched scripts, as well as annotations for sentiment and topic categories. We perform baseline experiments by fine-tuning multi-lingual language models. We further explore the effect of script vs. language similarity in cross-lingual transfer by fine-tuning multi-lingual models on languages which are a) typologically distinct, but use the same script, b) typologically similar, but use a distinct script, or c) are typologically similar and use the same script. We find there is a delicate relationship between script and typology for part-of-speech, while sentiment analysis is less sensitive.

Cite

CITATION STYLE

APA

Touileb, S., & Barnes, J. (2021). The interplay between language similarity and script on a novel multi-layer Algerian dialect corpus. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 3700–3712). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.324

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free