Text mining and natural language processing are gaining significant role in our daily life as information volumes increase steadily. Most of the digital information is unstructured in the form of raw text. While for several languages there is extensive research on mining and language processing, much less work has been performed for other languages. In this paper we aim to evaluate the performance of some of the most important text classification algorithms over a corpus composed of Albanian texts. After applying natural language preprocessing steps, we apply several algorithms such as Simple Logistics, Naïve Bayes, k-Nearest Neighbor, Decision Trees, Random Forest, Support Vector Machines and Neural Networks. The experiments show that Naïve Bayes and Support Vector Machines perform best in classifying Albanian corpuses. Furthermore, Simple Logistics algorithm also shows good results.
CITATION STYLE
Trandafili, E., Kote, N., & Biba, M. (2018). Performance evaluation of text categorization algorithms using an albanian corpus. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 17, pp. 537–547). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-75928-9_48
Mendeley helps you to discover research relevant for your work.