Weight clustering is an effective technique for compressing deep neural networks (DNNs) memory by using a limited number of unique weights and low-bit weight indexes to store clustering information. In this paper, we propose PatterNet, which enforces shared clustering topologies on filters. Cluster sharing leads to a greater extent of memory reduction by reusing the index information. PatterNet effectively factorizes input activations and post-processes the unique weights, which saves multiplications by several orders of magnitude. Furthermore, PatterNet reduces the add operations by harnessing the fact that filters sharing a clustering pattern have the same factorized terms. We introduce techniques for determining and assigning clustering patterns and training a network to fulfill the target patterns. We also propose and implement an efficient accelerator that builds upon the patterned filters. Experimental results show that PatterNet shrinks the memory and operation count up to 80.2% and 73.1%, respectively, with similar accuracy to the baseline models. PatterNet accelerator improves the energy efficiency by 107x over Nvidia 1080 1080 GTX and 2.2x over state of the art.
CITATION STYLE
Khaleghi, B., Mallappa, U., Yaldiz, D., Yang, H., Shah, M., Kang, J., & Rosing, T. (2022). PatterNet: Explore and Exploit Filter Patterns for Efficient Deep Neural Networks. In Proceedings - Design Automation Conference (pp. 223–228). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3489517.3530422
Mendeley helps you to discover research relevant for your work.