Developing natural language processing tools usually requires a large number of resources (lexica, annotated corpora, etc.), which often do not exist for less-resourced languages. One way to overcome the problem of lack of resources is to devote substantial efforts to build new ones from scratch. Another approach is to exploit existing resources of closely related languages. In this paper, we focus on developing a part-of-speech tagger for the Tunisian Arabic dialect (TUN), a lowresource language, by exploiting its closeness to Modern Standard Arabic (MSA), which has many state-of-the-art resources and tools. Our system achieved an accuracy of 89% (∼20% absolute improvement over an MSA tagger baseline).
CITATION STYLE
Hamdi, A., Nasr, A., Habash, N., & Gala, N. (2015). Pos-tagging of tunisian dialect using standard arabic resources and tools. In 2nd Workshop on Arabic Natural Language Processing, ANLP 2015 - held at 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015 - Proceedings (pp. 59–68). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w15-3207
Mendeley helps you to discover research relevant for your work.