Feature selection for enhanced author identification of turkish text

Yasemin Bay; Erbuĝ Çelebi

Conference Proceedings

Feature selection for enhanced author identification of turkish text

Lecture Notes in Electrical Engineering (2016) 363 371-379

DOI: 10.1007/978-3-319-22635-4_34

N/ACitations

12Readers

Get full text

Abstract

The rapid growth of the Internet and the increasing availability of electronic documents poses some problems, such as identification of an anonymous text and plagiarism. This study aims to determine the author of a given document among the set of text documents whose author is known. Despite the excess number of researches conducted in English language for author identification in the last century, Turkish and other languages are gaining interest only in the last decade. Therefore, this study deals with the Author Identification problem using two different Turkish datasets, collected from two different Turkish newspapers. The datasets comprises 850 columns written by 17 columnists as a total, 50 columns from each columnist. 4 different Machine Learning algorithms (Naive Bayes, Support Vector Machine, the K-Nearest Neighbor and Decision Tree) have been employed and 99.7% accuracy is achieved with K-Nearest Neighbor algorithm. The classification fully recognized with Chi-square feature selection method by reducing the features from 20 to 17.

Author supplied keywords

Cite

CITATION STYLE

APA

Bay, Y., & Çelebi, E. (2016). Feature selection for enhanced author identification of turkish text. In Lecture Notes in Electrical Engineering (Vol. 363, pp. 371–379). Springer Verlag. https://doi.org/10.1007/978-3-319-22635-4_34

Feature selection for enhanced author identification of turkish text

Abstract

Author supplied keywords

Cite

Register to see more suggestions