Performance evaluation of text categorization algorithms using an albanian corpus

Evis Trandafili; Nelda Kote; Marenglen Biba

Book Chapter

Performance evaluation of text categorization algorithms using an albanian corpus

Springer Science and Business Media Deutschland GmbH, (2018), 537-547

DOI: 10.1007/978-3-319-75928-9_48

5Citations

3Readers

Get full text

Abstract

Text mining and natural language processing are gaining significant role in our daily life as information volumes increase steadily. Most of the digital information is unstructured in the form of raw text. While for several languages there is extensive research on mining and language processing, much less work has been performed for other languages. In this paper we aim to evaluate the performance of some of the most important text classification algorithms over a corpus composed of Albanian texts. After applying natural language preprocessing steps, we apply several algorithms such as Simple Logistics, Naïve Bayes, k-Nearest Neighbor, Decision Trees, Random Forest, Support Vector Machines and Neural Networks. The experiments show that Naïve Bayes and Support Vector Machines perform best in classifying Albanian corpuses. Furthermore, Simple Logistics algorithm also shows good results.

Cite

CITATION STYLE

APA

Trandafili, E., Kote, N., & Biba, M. (2018). Performance evaluation of text categorization algorithms using an albanian corpus. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 17, pp. 537–547). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-75928-9_48

Performance evaluation of text categorization algorithms using an albanian corpus

Abstract

Cite

Register to see more suggestions