CAKE: Matrix multiplication using constant-bandwidth blocks

7Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We offer a novel approach to matrix-matrix multiplication computation on computing platforms with memory hierarchies. Constantbandwidth (CB) blocks improve computation throughput for architectures limited by external memory bandwidth. Configuring the shape and size of CB blocks operating from within any memory hierarchy level (e.g., internal SRAM), we achieve high throughput while holding external bandwidth (e.g., with DRAM) constant. We explain how, surprisingly, CB blocks can maintain constant external bandwidth as computation throughput increases. Analogous to partitioning a cake into pieces, we dub our CB-partitioned system CAKE. We show CAKE outperforms state-of-The-Art libraries in computation time on real-world systems where external bandwidth represents a bottleneck, demonstrating CAKE s ability to address the memory wall. CAKE achieves superior performance by directly using theoretically optimal CB-partitioned blocks in tiling and scheduling, obviating the need for extensive design search.

Cite

CITATION STYLE

APA

Kung, H. T., Natesh, V., & Sabot, A. (2021). CAKE: Matrix multiplication using constant-bandwidth blocks. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. IEEE Computer Society. https://doi.org/10.1145/3458817.3476166

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free