Leveraging Bidirectionl LSTM with CRFs for Pashto Tagging

Farooq Zaman; Onaiza Maqbool; Jaweria Kanwal

Journal ArticleOPEN ACCESS

Leveraging Bidirectionl LSTM with CRFs for Pashto Tagging

ACM Transactions on Asian and Low-Resource Language Information Processing (2024) 23(4)

DOI: 10.1145/3649456

7Citations

7Readers

Get full text

Abstract

Part-of-speech tagging plays a vital role in text processing and natural language understanding. Very few attempts have been made in the past for tagging Pashto Part-of-Speech. In this work, we present a Long Short-Term Memory-based approach for Pashto part-of-speech tagging with special focus on ambiguity resolution. Initially, we created a corpus of Pashto sentences having words with multiple meanings and their tags. We introduce a powerful sentences representation and new architecture for Pashto text processing. The accuracy of the proposed approach is compared with state-of-The-Art Hidden Markov Model. Our Model shows 87.60% accuracy for all words excluding punctuation and 95.45% for ambiguous words; however, Hidden Markov Model shows 78.37% and 44.72% accuracy, respectively. Results show that our approach outperforms Hidden Markov Model in Part-of-Speech tagging for Pashto text.

Author supplied keywords

Cite

CITATION STYLE

APA

Zaman, F., Maqbool, O., & Kanwal, J. (2024). Leveraging Bidirectionl LSTM with CRFs for Pashto Tagging. ACM Transactions on Asian and Low-Resource Language Information Processing, 23(4). https://doi.org/10.1145/3649456

Leveraging Bidirectionl LSTM with CRFs for Pashto Tagging

Abstract

Author supplied keywords

Cite

Register to see more suggestions