A Parallel Algorithm for Record Clustering

Edward Omiecinski; Peter Scheuermann

Journal ArticleOPEN ACCESS

A Parallel Algorithm for Record Clustering

ACM Transactions on Database Systems (TODS) (1990) 15(4) 599-624

DOI: 10.1145/99935.99947

10Citations

12Readers

Abstract

We present an efficient heuristic algorithm for record clustering that can run on a SIMD machine. We introduce the P-tree, and its associated numbering scheme, which in the split phase allows each processor independently to compute the unique cluster number of a record satisfying an arbitrary query. We show that by restricting ourselves in the merge phase to combining only sibling clusters, we obtain a parallel algorithm whose speedup ratio is optimal in the number of processors used. Finally, we report on experiments showing that our method produces substantial savings in an enviornment with relatively little overlap among the queries. © 1990, ACM. All rights reserved.

Cite

CITATION STYLE

APA

Omiecinski, E., & Scheuermann, P. (1990). A Parallel Algorithm for Record Clustering. ACM Transactions on Database Systems (TODS), 15(4), 599–624. https://doi.org/10.1145/99935.99947

A Parallel Algorithm for Record Clustering

Abstract

Cite

Register to see more suggestions