Arabic POS Tagging: Don’t Abandon Feature Engineering Just Yet

37Citations
Citations of this article
92Readers
Mendeley users who have this article in their library.

Abstract

This paper focuses on comparing between using Support Vector Machine based ranking (SVMRank) and Bidirectional Long-Short-Term-Memory (bi-LSTM) neural-network based sequence labeling in building a state-of-the-art Arabic part-of-speech tagging system. Using SVMRank leads to state-of-the-art results, but with a fair amount of feature engineering. Using bi-LSTM, particularly when combined with word embeddings, may lead to competitive POS-tagging results by automatically deducing latent linguistic features. However, we show that augmenting bi-LSTM sequence labeling with some of the features that we used for the SVMRankbased tagger yields to further improvements. We also show that gains realized using embeddings may not be additive with the gains achieved due to features. We are open-sourcing both the SVMRank and the bi-LSTM based systems for the research community.

Cite

CITATION STYLE

APA

Darwish, K., Mubarak, H., Abdelali, A., & Eldesouki, M. (2017). Arabic POS Tagging: Don’t Abandon Feature Engineering Just Yet. In WANLP 2017, co-located with EACL 2017 - 3rd Arabic Natural Language Processing Workshop, Proceedings of the Workshop (pp. 130–137). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/W17-1316

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free