Accelerating clustering using approximate spanning tree and prime number based filter

Dhananjai Rao; Sutharzan Sreeskandarajan; Chun Liang

Conference Proceedings

Accelerating clustering using approximate spanning tree and prime number based filter

Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2019 (2019) 166-174

DOI: 10.1109/IPDPSW.2019.00037

1Citations

6Readers

Get full text

Abstract

Motivation: Clustering genomic data, including those generated via high-throughput sequencing, is an important preliminary step for assembly and analysis. However, clustering a large number of sequences is time-consuming. Methods: In this paper, we discuss algorithmic performance improvements to our existing clustering system called PEACE via the following two new approaches: (1) using Approximate Spanning Tree (AST) that is computed much faster than the currently used Minimum Spanning Tree (MST) approach, and (2) a novel Prime Numbers based Heuristic (PNH) for generating features and comparing them to further reduce comparison overheads. Results: Experiments conducted using a variety of data sets show that the proposed method significantly improves performance for datasets with large clusters with only minimal degradation in clustering quality. We also compare our methods against wcd-kaboom, a state-of-the-art clustering software. Our experiments show that with AST and PNH underperform wcd-kaboom for datasets that have many small clusters. However, they significantly outperform wcd-kaboom for datasets with large clusters by a conspicuous ~550x with comparable clustering quality. The results indicate that the proposed methods hold considerable promise for accelerating clustering of genomic data with large clusters.

Author supplied keywords

Cite

CITATION STYLE

APA

Rao, D., Sreeskandarajan, S., & Liang, C. (2019). Accelerating clustering using approximate spanning tree and prime number based filter. In Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2019 (pp. 166–174). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/IPDPSW.2019.00037

Accelerating clustering using approximate spanning tree and prime number based filter

Abstract

Author supplied keywords

Cite

Register to see more suggestions