Abstract
In this paper, we present a comparative study of news documents classification using various supervised machine learning methods and different combinations of key-phrases (word N-grams extracted from text) and visual features (extracted from a representative image from each document). The application domain is news documents written in English that belong to four categories: Health, Lifestyle-Leisure, Nature-Environment and Politics. The use of the N-gram textual feature set alone led to an accuracy result of 81.0%, which is much better than the corresponding accuracy result (58.4%) obtained through the use of the visual feature set alone. A competition between three classification methods, a feature selection method, and parameter tuning led to improved accuracy (86.7%), achieved by the Random Forests method.
Author supplied keywords
Cite
CITATION STYLE
Hacohen-Kerner, Y., Sabag, A., Liparas, D., Moumtzidou, A., Vrochidis, S., & Kompatsiaris, I. (2015). Classification using various machine learning methods and combinations of key-phrases and visual features. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9398, pp. 64–75). Springer Verlag. https://doi.org/10.1007/978-3-319-27932-9_6
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.