Weight and activation sparsity can be leveraged in hardware to boost the performance and energy efficiency of Deep Neural Networks during inference. Fully capitalizing on sparsity requires re-scheduling and mapping the execution stream to deliver non-zero weight/activation pairs to multiplier units for maximal utilization and reuse. However, permitting arbitrary value re-scheduling in memory space and in time places a considerable burden on hardware to perform dynamic at-runtime routing and matching of values, and incurs significant energy inefficiencies. Bit-Tactical (TCL) is a neural network accelerator where the responsibility for exploiting weight sparsity is shared between a novel static scheduling middleware, and a co-designed hardware front-end with a lightweight sparse shuffling network comprising two (2- to 8-input) multiplexers per activation input. We empirically motivate two back-end designs chosen to target bit-sparsity in activations, rather than value-sparsity, with two benefits: A) we avoid handling the dynamically sparse whole-value activation stream, and b) we uncover more ineffectual work. TCL outperforms other state-of-the-art accelerators that target sparsity for weights and activations, the dynamic precision requirements of activations, or their bit-level sparsity for a variety of neural networks.
CITATION STYLE
Lascorz, A. D., Judd, P., Stuart, D. M., Poulos, Z., Mahmoud, M., Sharify, S., … Moshovos, A. (2019). Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks. In International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS (pp. 749–763). Association for Computing Machinery. https://doi.org/10.1145/3297858.3304041
Mendeley helps you to discover research relevant for your work.