Character-level dialect identification in arabic using long short-term memory

Karim Sayadi; Mansour Hamidi; Marc Bui; Marcus Liwicki; Andreas Fischer

Conference Proceedings

Character-level dialect identification in arabic using long short-term memory

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10762 LNCS 324-337

DOI: 10.1007/978-3-319-77116-8_24

3Citations

8Readers

Get full text

Abstract

In this paper, we introduce a neural network based sequence learning approach for the task of Arabic dialect classification. Character models based on recurrent neural networks with Long Short-Term Memory (LSTM) are suggested to classify short texts, such as tweets, written in different Arabic dialects. The LSTM-based character models can handle long-term dependencies in character sequences and do not require a set of linguistic rules at word-level, which is especially useful for the rich morphology of the Arabic language and the lack of strict orthographic rules for dialects. On the Tunisian Election Twitter dataset, our system achieves a promising average accuracy of 92.2% for distinguishing Modern Standard Arabic from Tunisian dialect. On the Multidialectal Parallel Corpus of Arabic, the proposed character models can distinguish six classes, Modern Standard Arabic and five Arabic dialects, with an average accuracy of 63.4%. They clearly outperform a standard word-level approach based on statistical n-grams as well as several other existing systems.

Cite

CITATION STYLE

APA

Sayadi, K., Hamidi, M., Bui, M., Liwicki, M., & Fischer, A. (2018). Character-level dialect identification in arabic using long short-term memory. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10762 LNCS, pp. 324–337). Springer Verlag. https://doi.org/10.1007/978-3-319-77116-8_24

Character-level dialect identification in arabic using long short-term memory

Abstract

Cite

Register to see more suggestions