Code switch point detection in arabic

Heba Elfardy; Mohamed Al-Badrashiny; Mona Diab

Conference Proceedings

Code switch point detection in arabic

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7934 LNCS 412-416

DOI: 10.1007/978-3-642-38824-8_51

18Citations

13Readers

Get full text

Abstract

This paper introduces a dual-mode stochastic system to automatically identify linguistic code switch points in Arabic. The first of these modes determines the most likely word tag (i.e. dialect or modern standard Arabic) by choosing the sequence of Arabic word tags with maximum marginal probability via lattice search and 5-gram probability estimation. When words are out of vocabulary, the system switches to the second mode which uses a dialectal Arabic (DA) and modern standard Arabic (MSA) morphological analyzer. If the OOV word is analyzable using the DA morphological analyzer only, it is tagged as DA, if it is analyzable using the MSA morphological analyzer only, it is tagged as MSA, otherwise if analyzable using both of them, then it is tagged as both. The system yields an F β = 1 score of 76.9% on the development dataset and 76.5% on the held-out test dataset, both judged against human-annotated Egyptian forum data. © 2013 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Elfardy, H., Al-Badrashiny, M., & Diab, M. (2013). Code switch point detection in arabic. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7934 LNCS, pp. 412–416). https://doi.org/10.1007/978-3-642-38824-8_51

Code switch point detection in arabic

Abstract

Author supplied keywords

Cite

Register to see more suggestions