Exploiting concept clumping for efficient incremental news article categorization

Alfred Krzywicki; Wayne Wobcke

Conference Proceedings

Exploiting concept clumping for efficient incremental news article categorization

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 7120 LNAI(PART 1) 353-366

DOI: 10.1007/978-3-642-25853-4_27

1Citations

3Readers

Get full text

Abstract

In this paper, we introduce efficient methods for incremental multi-label categorization of documents. We use concept clumping to efficiently categorize news articles into a hierarchical structure of categories. Concept clumping is a phenomenon of local coherences occurring in the data and it has been previously used for fast, incremental e-mail classification. We extend the definition of clumping and introduce additional clumping metrics specifically for multi-label document categorization. We present three methods for incremental multi-label categorization that exploit concept clumping and make use of thresholding techniques and a new term-category weight boosting method. Our methods are tested using the Reuters (RCV1) news corpus and the accuracy obtained is comparable to some well known machine learning methods trained in batch mode, but with much lower computation time. © 2011 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Krzywicki, A., & Wobcke, W. (2011). Exploiting concept clumping for efficient incremental news article categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7120 LNAI, pp. 353–366). https://doi.org/10.1007/978-3-642-25853-4_27

Exploiting concept clumping for efficient incremental news article categorization

Abstract

Author supplied keywords

Cite

Register to see more suggestions