Abstract
Samawa language is one of living languages with more than 500K native speakers in Sumbawa island, Indonesia. There are, however, extremely small amounts of available resources and efforts to develop tools in the Natural Language Processing (NLP) discipline area. In this paper, we observe and evaluate three models of probabilistic approach which are Unigram, Hidden Markov Model (HMM) and Trigram'n'Tags (TnT) models for part of speech tagging problem, which is a process to label either word or punctuation in a sentence. We used k-fold cross-validation (with k = 5 and 10) and tagged corpus around 20K tokens with 24 tags. TnT model gives the best performance reached 96.18% compared with the other models. This result shows that TnT model could be considered and used to extend Samawa corpora and help some NLP tasks in the future.
Cite
CITATION STYLE
Hariyanti, T., Aida, S., & Kameda, H. (2019). Samawa Language Part of Speech Tagging with Probabilistic Approach: Comparison of Unigram, HMM and TnT Models. In Journal of Physics: Conference Series (Vol. 1235). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/1235/1/012013
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.