Scalable clustering algorithm for N-body simulations in a shared-nothing cluster

47Citations
Citations of this article
40Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Scientists' ability to generate and collect massive-scale datasets is increasing. As a result, constraints in data analysis capability rather than limitations in the availability of data have become the bottleneck to scientific discovery. MapReduce-style platforms hold the promise to address this growing data analysis problem, but it is not easy to express many scientific analyses in these new frameworks. In this paper, we study data analysis challenges found in the astronomy simulation domain. In particular, we present a scalable, parallel algorithm for data clustering in this domain. Our algorithm makes two contributions. First, it shows how a clustering problem can be efficiently implemented in a MapReduce-style framework. Second, it includes optimizations that enable scalability, even in the presence of skew. We implement our solution in the Dryad parallel data processing system using DryadLINQ. We evaluate its performance and scalability using a real dataset comprised of 906 million points, and show that in an 8-node cluster, our algorithm can process even a highly skewed dataset 17 times faster than the conventional implementation and offers near-linear scalability. Our approach matches the performance of an existing hand-optimized implementation used in astrophysics on a dataset with little skew and significantly outperforms it on a skewed dataset. © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Kwon, Y., Nunley, D., Gardner, J. P., Balazinska, M., Howe, B., & Loebman, S. (2010). Scalable clustering algorithm for N-body simulations in a shared-nothing cluster. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6187 LNCS, pp. 132–150). https://doi.org/10.1007/978-3-642-13818-8_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free