Two-Stage Column Block Parallel LU Factorization Algorithm

2Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Parallel computing is increasingly important in computer architectures, parallel architecture has become ubiquitous in our everyday lives. Novel architectures and programming models pose new challenges to algorithm design and system software development. This paper presents a two-stage column block parallel LU factorization algorithm for multiple-processor architectures. Any given matrix is first partitioned into large blocks, and then, every large block is partitioned into a number of small blocks according to the number of processors. Finally, the small column blocks are allocated to processors in an orderly 'serpentine arrangement.' Each iteration of the column block parallel LU factorization is separated into two stages of operation. In the first stage, the first-step factorization operation is processed in advance and nonblocking communication is used to reduce the processor idle and waiting time and improve parallelism. In the second stage, the large blocks are used to satisfy more powerful processors, such as GPUs, which require more data to exploit their computing capabilities. Experiments are conducted on a multicore system and multi-GPU system with different configurations to test the algorithm's performance. Compared with other related column block parallel LU factorizations, the two-stage algorithm exhibits better load balancing and parallel execution time performance.

Cite

CITATION STYLE

APA

Wu, R., & Xie, X. (2020). Two-Stage Column Block Parallel LU Factorization Algorithm. IEEE Access, 8, 2645–2655. https://doi.org/10.1109/ACCESS.2019.2962355

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free