A new content-free approach to identification of document language: Angle patterns

6Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Language Identification in text mining is the process of detecting the natural language in which a document or part of it is written. Language identification aims to mimic a human's ability to recognize certain languages by computer algorithms. In this study, a new language identification approach using the angle information between the UTF-8 values of the characters in the text is proposed. The proposed angle pattern method is used for feature extraction from texts. Angle patterns method is a statistical approach. In the angle method, there are two distance parameters, R and L, which express which neighborhood to look at from the reference point to the left and right. To test the proposed approach, four datasets, two created by the authors and two publicly available on the Internet, were used. By using the features obtained by the angle pattern method, classification process was carried out with different machine learning methods such as Random Forest, Support Vector Machine, Linear Discriminant Analysis, Naive Bayes and K-nearest neighbor. Language identification performance results determined from four different data sets were observed as 96,81%, 99,39%, 93,31% and 98,60%, respectively. According to the performance results achieved as a result of the study, it has been determined that the proposed angle pattern method provides important distinguishing information in language identification application.

Cite

CITATION STYLE

APA

Noyan, T., Kuncan, F., Tekin, R., & Kaya, Y. (2022). A new content-free approach to identification of document language: Angle patterns. Journal of the Faculty of Engineering and Architecture of Gazi University, 37(3), 1277–1292. https://doi.org/10.17341/gazimmfd.844700

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free