Abstract
Methods exist for constant-time clustering of corpus subsets selected via Scatter/Gather browsing [3]. In this paper we expand on those techniques, giving an algorithm for almost-constant-time clustering of arbitrary corpus subsets. This algorithm is never slower than clustering the document set from scratch, and for medium-sized and large sets it is significantly faster. This algorithm is useful for clustering arbitrary subsets of large corpora - obtained, for instance, by a boolean search - quickly enough to be useful in an interactive setting. Copyright 1997 ACM.
Cite
CITATION STYLE
Silverstein, C., & Pedersen, J. O. (1997). Almost-constant-time clustering of arbitrary corpus subsets. SIGIR Forum (ACM Special Interest Group on Information Retrieval), 31(1 SPEC. ISS.), 60–64. https://doi.org/10.1145/278459.258535
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.