Feature selection for enhanced author identification of turkish text

N/ACitations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The rapid growth of the Internet and the increasing availability of electronic documents poses some problems, such as identification of an anonymous text and plagiarism. This study aims to determine the author of a given document among the set of text documents whose author is known. Despite the excess number of researches conducted in English language for author identification in the last century, Turkish and other languages are gaining interest only in the last decade. Therefore, this study deals with the Author Identification problem using two different Turkish datasets, collected from two different Turkish newspapers. The datasets comprises 850 columns written by 17 columnists as a total, 50 columns from each columnist. 4 different Machine Learning algorithms (Naive Bayes, Support Vector Machine, the K-Nearest Neighbor and Decision Tree) have been employed and 99.7% accuracy is achieved with K-Nearest Neighbor algorithm. The classification fully recognized with Chi-square feature selection method by reducing the features from 20 to 17.

Cite

CITATION STYLE

APA

Bay, Y., & Çelebi, E. (2016). Feature selection for enhanced author identification of turkish text. In Lecture Notes in Electrical Engineering (Vol. 363, pp. 371–379). Springer Verlag. https://doi.org/10.1007/978-3-319-22635-4_34

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free