Abstract
In Natural Language Processing, Parts-of-Speech tagging is a vital component that significantly impacts applications like machine translation, spell-checker, information retrieval, and speech processing. In languages such as English and Dutch, POS tagging is considered a solved problem (accuracy: 97%). However, for low-resource languages like Bangla, challenges are still there. In this article, we have proposed a novel RNN-based network named BaNeP to determine parts of speech for Bangla words. The proposed network extracts structural features through a bidirectional LSTM-based sub-network, and intricate contextual relations among words of a sentence are identified through an elaborate weighted context extraction procedure. These features are then combinedly utilized to generate the final Parts-of-Speech prediction. Training the model requires only an annotated dataset vanishing the need for any hand-crafted features. Experimental results on the LDC2010T16 dataset show significant accuracy improvement compared to existing Bangla POS taggers.
Author supplied keywords
Cite
CITATION STYLE
Ovi, J. A., Islam, M. A., & Karim, M. R. (2022). BaNeP: An End-to-End Neural Network Based Model for Bangla Parts-of-Speech Tagging. IEEE Access, 10, 102753–102769. https://doi.org/10.1109/ACCESS.2022.3208269
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.