Author identification for under-resourced language Kadazandusun

6Citations
Citations of this article
23Readers
Mendeley users who have this article in their library.

Abstract

This paper presents the task of Author Identification for KadazanDusun language by using tweets as the source of data to perform Author Identification task of short text on KadazanDusun, which is considered as one the under-resourced language in Malaysia. The aim of this paper is to demonstrate Author Identification of short text on KadazanDusun. Besides, this paper also examines the performance of two machine learning algorithms on the KadazanDusun data set by analyzing the stylometric features. Stylometric features are used to quantify the writing styles of the authors which includes character n-grams and word n-grams. The workflow of Author Identification implements the machine learning approach to solve the single-labelled multi-class problem and predict the author of a given message in KadazanDusun. Two classifiers are used to compare the accuracy including Naïve Bayes and Support Vector Machine (SVM). The results show that the combination of n-grams which is word-level unigram and {1-5}-grams with character 3-grams are the most relevant stylometric features in identifying the author of KadazanDusun message with an accuracy of 80.17%. The results also show that SVM classifier has outperformed Naive Bayes in this Author Identification task with the accuracy of 80.17%.

References Powered by Scopus

A survey of modern authorship attribution methods

1214Citations
N/AReaders
Get full text

Computational methods in authorship attribution

520Citations
N/AReaders
Get full text

Mining E-mail content for author identification forensics

360Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Natural disaster on twitter: Role of feature extraction method of word2vec and lexicon based for determining direct eyewitness

9Citations
N/AReaders
Get full text

A Malaysian scholar identification model based on word2vec-based-stylometry computational approach

4Citations
N/AReaders
Get full text

State of the Art in Authorship Attribution with Impact Analysis of Stylometric Features on Style Breach Prediction

1Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Tarmizi, N., Saee, S., & Ibrahim, D. H. A. (2019). Author identification for under-resourced language Kadazandusun. Indonesian Journal of Electrical Engineering and Computer Science, 17(1), 248–255. https://doi.org/10.11591/ijeecs.v17.i1.pp248-255

Readers' Seniority

Tooltip

Lecturer / Post doc 7

58%

PhD / Post grad / Masters / Doc 4

33%

Researcher 1

8%

Readers' Discipline

Tooltip

Computer Science 8

57%

Engineering 4

29%

Mathematics 1

7%

Psychology 1

7%

Save time finding and organizing research with Mendeley

Sign up for free