For the solutions of linear systems of equations with unsymmetric coefficient matrices, we have proposed an improved version of the quasi-minimal residual (IQMR) method [Proceedings of The International Conference on High Performance Computing and Networking (HPCN-97) (1997); IEICE Trans Inform Syst E80-D (9) (1997) 919] by using the Lanczos process as a major component combining elements of numerical stability and parallel algorithm design. For the Lanczos process, stability is obtained by a coupled two-term procedure that generates Lanczos vectors scaled to unit length. The algorithm is derived so that all inner products and matrix-vector multiplications of a single iteration step are independent and the communication time required for inner product can be overlapped efficiently with computation time. In this paper, a theoretical model of computation and communications phases is presented to allow us to give a quantitative analysis of the parallel performance with a two-dimensional grid topology. The efficiency, speed-up, and runtime are expressed as functions of the number of processors scaled by the number of processors that gives the minimal runtime for the given problem size. The model not only evaluates effectively the improvements in performance due to communication reduction by overlapping, but also provides useful insight into the scalability of the IQMR method. The theoretical results on the performance are demonstrated by experimental timing results carried out on a massively parallel distributed memory Parsytec system. © 2002 Published by Elsevier Science Ltd.
Yang, L. T., & Brent, R. P. (2002). Quantitative performance analysis of the improved quasi-minimal residual method on massively distributed memory computers. Advances in Engineering Software, 33(3), 169–177. https://doi.org/10.1016/S0965-9978(01)00055-2