TÜRKÇE METİNLERİN SINIFLANDIRMA BAŞARISINI ARTIRMAK İÇİN YENİ BİR YÖNTEM ÖNERİSİ

  • BİLGİN M
N/ACitations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

This study aims to estimate the author of an unknown document. For this purpose, first of all, six different columns of 6 different columnists were pre-processed. Then with n-grams (2-3) features were extracted from these texts. The system has been tested with 10-fold cross-validation on 6 different machine learning algorithms. This part of the study is the method that has been applied so far in the literature. Our suggestion is to reduce the number of features with the LZW algorithm and to investigate the effects on the success of the system. The pre-processed texts are compressed binary and decimal with the LZW algorithm. After compression, the system has been tested with 6 different machine learning algorithms, and the study results has been analyzed for 5 different metrics. As a result of the study, the compressed binary text has obtained better results in both 2-gram and 3-gram, for 6 different machine learning algorithms. In the Random-Tree and Naïve Bayes algorithm, decimal compression is behind the raw data. In the other four algorithms, it achieved better results but remained behind the average success values. As a result of the study, binary compression is more successful in all metrics than the other two methods. In the study, although the author recognition process has been done, it can be thought that the proposed method can be used in all text classification procedures.

Cite

CITATION STYLE

APA

BİLGİN, M. (2019). TÜRKÇE METİNLERİN SINIFLANDIRMA BAŞARISINI ARTIRMAK İÇİN YENİ BİR YÖNTEM ÖNERİSİ. Uludağ University Journal of The Faculty of Engineering, 24(1), 125–136. https://doi.org/10.17482/uumfd.484525

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free