Part-of-speech tagger for malay social media texts

16Citations
Citations of this article
75Readers
Mendeley users who have this article in their library.

Abstract

Processing the meaning of words in social media texts, such as tweets, is challenging in natural language processing. Malay tweets are no exception because they demonstrate distinct linguistic phenomena, such as the use of dialects from each state in Malaysia; borrowing foreign language terms in the context of Malay language; and using mixed languages, abbreviations and spelling errors or mistakes in sentence structure. Tagging the word class of tweets is an arduous task because tweets are characterised by their distinctive style, linguistic sounds and errors. Currently, existing works on Malay part-of-speech (POS) are based only on standard Malay and formal texts and are thus unsuitable for tagging tweet texts. Thus, a POS model of tweet tagging for non-standardised Malay language must be developed. This study aims to design and implement a non-standardised Malay POS model for tweets and performs assessment on the basis of the word tagging accuracy of test data of unnormalised and normalised tweet texts. A solution that adopts a probabilistic POS tagging called QTAG is proposed. Results show that the Malay QTAG achieves best average POS tagging accuracies of 90% and 88.8% for normalised and unnormalised test datasets, respectively.

Cite

CITATION STYLE

APA

Ariffin, S. N. A. N., & Tiun, S. (2018). Part-of-speech tagger for malay social media texts. GEMA Online Journal of Language Studies, 18(4), 124–142. https://doi.org/10.17576/gema-2018-1804-09

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free