Abstract
Part-of-speech tagging plays a vital role in text processing and natural language understanding. Very few attempts have been made in the past for tagging Pashto Part-of-Speech. In this work, we present a Long Short-Term Memory-based approach for Pashto part-of-speech tagging with special focus on ambiguity resolution. Initially, we created a corpus of Pashto sentences having words with multiple meanings and their tags. We introduce a powerful sentences representation and new architecture for Pashto text processing. The accuracy of the proposed approach is compared with state-of-The-Art Hidden Markov Model. Our Model shows 87.60% accuracy for all words excluding punctuation and 95.45% for ambiguous words; however, Hidden Markov Model shows 78.37% and 44.72% accuracy, respectively. Results show that our approach outperforms Hidden Markov Model in Part-of-Speech tagging for Pashto text.
Author supplied keywords
Cite
CITATION STYLE
Zaman, F., Maqbool, O., & Kanwal, J. (2024). Leveraging Bidirectionl LSTM with CRFs for Pashto Tagging. ACM Transactions on Asian and Low-Resource Language Information Processing, 23(4). https://doi.org/10.1145/3649456
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.