Exploiting concept clumping for efficient incremental e-mail categorization

Alfred Krzywicki; Wayne Wobcke

Conference Proceedings

Exploiting concept clumping for efficient incremental e-mail categorization

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6441 LNAI(PART 2) 244-258

DOI: 10.1007/978-3-642-17313-4_25

5Citations

1Readers

Get full text

Abstract

We introduce a novel approach to incremental e-mail categorization based on identifying and exploiting "clumps" of messages that are classified similarly. Clumping reflects the local coherence of a classification scheme and is particularly important in a setting where the classification scheme is dynamically changing, such as in e-mail categorization. We propose a number of metrics to quantify the degree of clumping in a series of messages. We then present a number of fast, incremental methods to categorize messages and compare the performance of these methods with measures of the clumping in the datasets to show how clumping is being exploited by these methods. The methods are tested on 7 large real-world e-mail datasets of 7 users from the Enron corpus, where each message is classified into one folder. We show that our methods perform well and provide accuracy comparable to several common machine learning algorithms, but with much greater computational efficiency. © 2010 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Krzywicki, A., & Wobcke, W. (2010). Exploiting concept clumping for efficient incremental e-mail categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6441 LNAI, pp. 244–258). https://doi.org/10.1007/978-3-642-17313-4_25

Exploiting concept clumping for efficient incremental e-mail categorization

Abstract

Author supplied keywords

Cite

Register to see more suggestions