LibShalom: Optimizing Small and Irregular-Shaped Matrix Multiplications on ARMv8 Multi-Cores

Weiling Yang; Jianbin Fang; Dezun Dong; Xing Su; Zheng Wang

Conference ProceedingsOPEN ACCESS

LibShalom: Optimizing Small and Irregular-Shaped Matrix Multiplications on ARMv8 Multi-Cores

International Conference for High Performance Computing, Networking, Storage and Analysis, SC (2021)

DOI: 10.1145/3458817.3476217

28Citations

17Readers

Get full text

Abstract

General Matrix Multiplication (GEMM) is a key subroutine in highperformance computing. While the mainstream linear algebra libraries can deliver high performance on large and regular-shaped GEMM, they are inadequate for optimizing small and irregularshaped GEMMs, which are commonly seen in new HPC applications. Some of the recent works in this direction have made promising progress on x86 architectures and GPUs but still leave much room for improvement on emerging HPC hardware built upon the ARMv8 architecture.We present LibShalom, an open-source library for optimizing small and irregular-shaped GEMMs, explicitly targeting the ARMv8 architecture. LibShalom builds upon the classical Goto algorithm but tailors it to minimize the expensive memory accessing overhead for data packing and processing small matrices. It uses analytic methods to determine GEMM kernel optimization parameters, enhancing the computation and parallelization efficiency of the GEMM kernels. We evaluate LibShalom by applying it to three ARMv8 multi-core architectures and comparing it against five mainstream linear algebra libraries. Experimental results show that LibShalom can consistently outperform existing solutions across GEMM workloads and hardware architectures.

Author supplied keywords

Cite

CITATION STYLE

APA

Yang, W., Fang, J., Dong, D., Su, X., & Wang, Z. (2021). LibShalom: Optimizing Small and Irregular-Shaped Matrix Multiplications on ARMv8 Multi-Cores. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. IEEE Computer Society. https://doi.org/10.1145/3458817.3476217

LibShalom: Optimizing Small and Irregular-Shaped Matrix Multiplications on ARMv8 Multi-Cores

Abstract

Author supplied keywords

Cite

Register to see more suggestions