Samawa Language Part of Speech Tagging with Probabilistic Approach: Comparison of Unigram, HMM and TnT Models

Trienani Hariyanti; Saori Aida; Hiroyuki Kameda

Conference ProceedingsOPEN ACCESS

Samawa Language Part of Speech Tagging with Probabilistic Approach: Comparison of Unigram, HMM and TnT Models

Journal of Physics: Conference Series (2019) 1235(1)

DOI: 10.1088/1742-6596/1235/1/012013

4Citations

11Readers

Abstract

Samawa language is one of living languages with more than 500K native speakers in Sumbawa island, Indonesia. There are, however, extremely small amounts of available resources and efforts to develop tools in the Natural Language Processing (NLP) discipline area. In this paper, we observe and evaluate three models of probabilistic approach which are Unigram, Hidden Markov Model (HMM) and Trigram'n'Tags (TnT) models for part of speech tagging problem, which is a process to label either word or punctuation in a sentence. We used k-fold cross-validation (with k = 5 and 10) and tagged corpus around 20K tokens with 24 tags. TnT model gives the best performance reached 96.18% compared with the other models. This result shows that TnT model could be considered and used to extend Samawa corpora and help some NLP tasks in the future.

Cite

CITATION STYLE

APA

Hariyanti, T., Aida, S., & Kameda, H. (2019). Samawa Language Part of Speech Tagging with Probabilistic Approach: Comparison of Unigram, HMM and TnT Models. In Journal of Physics: Conference Series (Vol. 1235). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/1235/1/012013

Samawa Language Part of Speech Tagging with Probabilistic Approach: Comparison of Unigram, HMM and TnT Models

Abstract

Cite

Register to see more suggestions