The rapid growth of the Internet and the increasing availability of electronic documents poses some problems, such as identification of an anonymous text and plagiarism. This study aims to determine the author of a given document among the set of text documents whose author is known. Despite the excess number of researches conducted in English language for author identification in the last century, Turkish and other languages are gaining interest only in the last decade. Therefore, this study deals with the Author Identification problem using two different Turkish datasets, collected from two different Turkish newspapers. The datasets comprises 850 columns written by 17 columnists as a total, 50 columns from each columnist. 4 different Machine Learning algorithms (Naive Bayes, Support Vector Machine, the K-Nearest Neighbor and Decision Tree) have been employed and 99.7% accuracy is achieved with K-Nearest Neighbor algorithm. The classification fully recognized with Chi-square feature selection method by reducing the features from 20 to 17.
CITATION STYLE
Bay, Y., & Çelebi, E. (2016). Feature selection for enhanced author identification of turkish text. In Lecture Notes in Electrical Engineering (Vol. 363, pp. 371–379). Springer Verlag. https://doi.org/10.1007/978-3-319-22635-4_34
Mendeley helps you to discover research relevant for your work.