Dynamic programming parallelization of matrix chain multiplication on GPU: A comparative study

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The dynamic programming paradigm involves various important optimization problems. The set of optimization problems includes optimal binary search tree, longest common subsequence, binary knapsack, Matrix chain multiplication (MCM), and many more. In dynamic programming problems, the MCM of n matrices comprises the computation of the parenthesization for an optimal matrix product, which requires the computation time of O(n3) using O(n2) table size. We propose the MCM parallelization techniques for thread-level of multi-core CPU and group of threads on NVIDIA GPU. The prime objective of this paper is to present and analyze massively parallel implementations of MCM algorithm using OpenMP and CUDA on parallel systems such as Intel Xeon CPU and NVIDIA GPU. The implemented parallel MCM algorithm achieved a speedup of 10× on an Intel Xeon using OpenMP and a speedup of 7× on NVIDIA Quadro FX 3800 GPU with reference to its serial implementation. So the speedup achieved on multi-core CPU dominates the speedup achieved by the GPU. This paper also presents performance comparisons for OpenMP, when chunk size of iterations of a loop and scheduling techniques of those chunks among core changes.

Cite

CITATION STYLE

APA

Diwan, T., & Sathe, S. R. (2016). Dynamic programming parallelization of matrix chain multiplication on GPU: A comparative study. In Advances in Intelligent Systems and Computing (Vol. 394, pp. 333–343). Springer Verlag. https://doi.org/10.1007/978-81-322-2656-7_30

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free