This research investigates the problem of news articles classification. The classification is performed using N-gram textual features extracted from text and visual features generated from one representative image. The application domain is news articles written in English that belong to four categories: Business-Finance, Lifestyle-Leisure, Science-Technology and Sports downloaded from three well-known news web-sites (BBC, Reuters, and TheGuardian). Various classification experiments have been performed with the Random Forests machine learning method using N-gram textual features and visual features from a representative image. Using the N-gram textual features alone led to much better accuracy results (84.4%) than using the visual features alone (53%). However, the use of both N-gram textual features and visual features led to slightly better accuracy results (86.2%). The main contribution of this work is the introduction of a news article classification framework based on Random Forests and multimodal features (textual and visual), as well as the late fusion strategy that makes use of Random Forests operational capabilities.
CITATION STYLE
Liparas, D., HaCohen-Kerner, Y., Moumtzidou, A., Vrochidis, S., & Kompatsiaris, I. (2014). News articles classification using random forests and weighted multimodal features. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8849, 63–75. https://doi.org/10.1007/978-3-319-12979-2_6
Mendeley helps you to discover research relevant for your work.