Parallel dual tree traversal on multi-core and many-core architectures for astrophysical N-body simulations

5Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In astrophysical N-body simulations, Dehnen's algorithm, implemented in the serial falcON code and based on a dual tree traversal, is faster than serial Barnes-Hut tree-codes, but outperformed by parallel CPU and GPU tree-codes. In this paper, we present a parallel dual tree traversal, implemented in the pfalcON code, targeting multicore CPUs and many-core architectures (Xeon Phi). We focus here on both performance and portability, while preserving Dehnen's original algorithm. We first use task parallelism, with either OpenMP or Intel TBB, for the dual tree traversal. We then rely on the SPMD (single-program, multiple-data) model for the SIMD vectorization of the near field part thanks to the Intel SPMD Program Compiler. We compare the pfalcON performance to related work, and finally obtain performance results that match one of the best current tree-code implementations on GPU. © 2014 Springer International Publishing Switzerland.

Cite

CITATION STYLE

APA

Lange, B., & Fortin, P. (2014). Parallel dual tree traversal on multi-core and many-core architectures for astrophysical N-body simulations. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8632 LNCS, pp. 716–727). Springer Verlag. https://doi.org/10.1007/978-3-319-09873-9_60

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free