A multi-GPU implementation of a D2Q37 Lattice Boltzmann code

18Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluster based on Nvidia Fermi processors. We analyze how to optimize the algorithm for GP-GPU architectures, describe the implementation choices that we have adopted and compare our performance results with an implementation optimized for latest generation multi-core CPUs. Our program runs at ≈ 30% of the double-precision peak performance of one GPU and shows almost linear scaling when run on the multi-GPU cluster. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Biferale, L., Mantovani, F., Pivanti, M., Pozzati, F., Sbragaglia, M., Scagliarini, A., … Tripiccione, R. (2012). A multi-GPU implementation of a D2Q37 Lattice Boltzmann code. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7203 LNCS, pp. 640–650). https://doi.org/10.1007/978-3-642-31464-3_65

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free