Transmuter: Bridging the efficiency gap using memory and dataflow reconfiguration

Subhankar Pal; Siying Feng; Dong Hyeon Park; Sung Kim; Aporva Amarnath; Chi Sheng Yang; Xin He; Jonathan Beaumont; Kyle May; Yan Xiong; Kuba Kaszyk; John Magnus Morton; Jiawen Sun; Michael O'Boyle; Murray Cole; Chaitali Chakrabarti; David Blaauw; Hun Seok Kim; Trevor Mudge; Ronald Dreslinski

Conference ProceedingsOPEN ACCESS

Transmuter: Bridging the efficiency gap using memory and dataflow reconfiguration

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT (2020) 175-190

DOI: 10.1145/3410463.3414627

19Citations

22Readers

Get full text

Abstract

With the end of Dennard scaling and Moore's law, it is becoming increasingly difficult to build hardware for emerging applications thatmeet power and performance targets, while remaining flexible andprogrammable for end users. This is particularly true for domainsthat have frequently changing algorithms and applications involving mixed sparse/dense data structures, such as those in machinelearning and graph analytics. To overcome this, we present a flexibleaccelerator called Transmuter, in a novel effort to bridge the gap between General-Purpose Processors (GPPs) and Application-SpecificIntegrated Circuits (ASICs). Transmuter adapts to changing kernelcharacteristics, such as data reuse and control divergence, throughthe ability to reconfigure the on-chip memory type, resource sharingand dataflow at run-time within a short latency. This is facilitatedby a fabric of light-weight cores connected to a network of reconfigurable caches and crossbars. Transmuter addresses a rapidlygrowing set of algorithms exhibiting dynamic data movement patterns, irregularity, and sparsity, while delivering GPU-like efficiencies for traditional dense applications. Finally, in order to supportprogrammability and ease-of-adoption, we prototype a softwarestack composed of low-level runtime routines, and a high-levellanguage library called TransPy, that cater to expert programmersand end-users, respectively.Our evaluations with Transmuter demonstrate average throughput (energy-efficiency) improvements of 5.0× (18.4×) and 4.2× (4.0×)over a high-end CPU and GPU, respectively, across a diverse set ofkernels predominant in graph analytics, scientific computing andmachine learning. Transmuter achieves energy-efficiency gains averaging 3.4× and 2.0× over prior FPGA and CGRA implementationsof the same kernels, while remaining on average within 9.3× ofstate-of-the-art ASICs.

Author supplied keywords

Cite

CITATION STYLE

APA

Pal, S., Feng, S., Park, D. H., Kim, S., Amarnath, A., Yang, C. S., … Dreslinski, R. (2020). Transmuter: Bridging the efficiency gap using memory and dataflow reconfiguration. In Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT (pp. 175–190). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3410463.3414627

Transmuter: Bridging the efficiency gap using memory and dataflow reconfiguration

Abstract

Author supplied keywords

Cite

Register to see more suggestions