Almost-constant-time clustering of arbitrary corpus subsets

Craig Silverstein; Jan O. Pedersen

Journal ArticleOPEN ACCESS

Almost-constant-time clustering of arbitrary corpus subsets

SIGIR Forum (ACM Special Interest Group on Information Retrieval) (1997) 31(1 SPEC. ISS.) 60-64

DOI: 10.1145/278459.258535

34Citations

9Readers

Abstract

Methods exist for constant-time clustering of corpus subsets selected via Scatter/Gather browsing [3]. In this paper we expand on those techniques, giving an algorithm for almost-constant-time clustering of arbitrary corpus subsets. This algorithm is never slower than clustering the document set from scratch, and for medium-sized and large sets it is significantly faster. This algorithm is useful for clustering arbitrary subsets of large corpora - obtained, for instance, by a boolean search - quickly enough to be useful in an interactive setting. Copyright 1997 ACM.

Cite

CITATION STYLE

APA

Silverstein, C., & Pedersen, J. O. (1997). Almost-constant-time clustering of arbitrary corpus subsets. SIGIR Forum (ACM Special Interest Group on Information Retrieval), 31(1 SPEC. ISS.), 60–64. https://doi.org/10.1145/278459.258535

Almost-constant-time clustering of arbitrary corpus subsets

Abstract

Cite

Register to see more suggestions