On the utility of incremental feature selection for the classification of textual data streams

Ioannis Katakis; Grigorios Tsoumakas; Ioannis Vlahavas

Conference Proceedings

On the utility of incremental feature selection for the classification of textual data streams

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2005) 3746 LNCS 338-348

DOI: 10.1007/11573036_32

42Citations

37Readers

Get full text

Abstract

In this paper we argue that incrementally updating the features that a text classification algorithm considers is very important for real-world textual data streams, because in most applications the distribution of data and the description of the classification concept changes over time. We propose the coupling of an incremental feature ranking method and an incremental learning algorithm that can consider different subsets of the feature vector during prediction (what we call a feature based classifier), in order to deal with the above problem. Experimental results with a longitudinal database of real spam and legitimate emails shows that our approach can adapt to the changing nature of streaming data and works much better than classical incremental learning algorithms. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Katakis, I., Tsoumakas, G., & Vlahavas, I. (2005). On the utility of incremental feature selection for the classification of textual data streams. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3746 LNCS, pp. 338–348). https://doi.org/10.1007/11573036_32

On the utility of incremental feature selection for the classification of textual data streams

Abstract

Cite

Register to see more suggestions