Performance evaluation of text categorization algorithms using an albanian corpus

5Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Text mining and natural language processing are gaining significant role in our daily life as information volumes increase steadily. Most of the digital information is unstructured in the form of raw text. While for several languages there is extensive research on mining and language processing, much less work has been performed for other languages. In this paper we aim to evaluate the performance of some of the most important text classification algorithms over a corpus composed of Albanian texts. After applying natural language preprocessing steps, we apply several algorithms such as Simple Logistics, Naïve Bayes, k-Nearest Neighbor, Decision Trees, Random Forest, Support Vector Machines and Neural Networks. The experiments show that Naïve Bayes and Support Vector Machines perform best in classifying Albanian corpuses. Furthermore, Simple Logistics algorithm also shows good results.

Cite

CITATION STYLE

APA

Trandafili, E., Kote, N., & Biba, M. (2018). Performance evaluation of text categorization algorithms using an albanian corpus. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 17, pp. 537–547). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-75928-9_48

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free