Efficient clustering of web-derived data sets

Luís Sarmento; Alexander Kehlenbeck; Eugénio Oliveira; Lyle Ungar

Conference Proceedings

Efficient clustering of web-derived data sets

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5632 LNAI 398-412

DOI: 10.1007/978-3-642-03070-3_30

3Citations

22Readers

Get full text

Abstract

Many data sets derived from the web are large, high-dimensional, sparse and have a Zipfian distribution of both classes and features. On such data sets, current scalable clustering methods such as streaming clustering suffer from fragmentation, where large classes are incorrectly divided into many smaller clusters, and computational efficiency drops significantly. We present a new clustering algorithm based on connected components that addresses these issues and so works well on web-type data. © 2009 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Sarmento, L., Kehlenbeck, A., Oliveira, E., & Ungar, L. (2009). Efficient clustering of web-derived data sets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5632 LNAI, pp. 398–412). https://doi.org/10.1007/978-3-642-03070-3_30

Efficient clustering of web-derived data sets

Abstract

Cite

Register to see more suggestions