Capstan: A vector RDA for sparsity

19Citations
Citations of this article
36Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper proposes Capstan: a scalable, parallel-patterns-based, reconfigurable dataflow accelerator (RDA) for sparse and dense tensor applications. Instead of designing for one application, we start with common sparse data formats, each of which supports multiple applications. Using a declarative programming model, Capstan supports application-independent sparse iteration and memory primitives that can be mapped to vectorized, high-performance hardware. We optimize random-access sparse memories with configurable out-oforder execution to increase SRAM random-access throughput from 32% to 80%. For a variety of sparse applications, Capstan with DDR4 memory is 18× faster than a multi-core CPU baseline, while Capstan with HBM2 memory is 16× faster than an Nvidia V100 GPU. For sparse applications that can be mapped to Plasticine, a recent dense RDA, Capstan is 7.6× to 365× faster and only 16% larger.

Cite

CITATION STYLE

APA

Rucker, A., Vilim, M., Zhao, T., Zhang, Y., Prabhakar, R., & Olukotun, K. (2021). Capstan: A vector RDA for sparsity. In Proceedings of the Annual International Symposium on Microarchitecture, MICRO (pp. 1022–1035). IEEE Computer Society. https://doi.org/10.1145/3466752.3480047

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free