Optimizing Data Flow in Binary Neural Networks

4Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

Binary neural networks (BNNs) can substantially accelerate a neural network’s inference time by substituting its costly floating-point arithmetic with bit-wise operations. Nevertheless, state-of-the-art approaches reduce the efficiency of the data flow in the BNN layers by introducing intermediate conversions from 1 to 16/32 bits. We propose a novel training scheme, denoted as BNN-Clip, that can increase the parallelism and data flow of the BNN pipeline; specifically, we introduce a clipping block that reduces the data width from 32 bits to 8. Furthermore, we decrease the internal accumulator size of a binary layer, usually kept using 32 bits to prevent data overflow, with no accuracy loss. Moreover, we propose an optimization of the batch normalization layer that reduces latency and simplifies deployment. Finally, we present an optimized implementation of the binary direct convolution for ARM NEON instruction sets. Our experiments show a consistent inference latency speed-up (up to (Formula presented.) and (Formula presented.) compared to two state-of-the-art BNN frameworks) while reaching an accuracy comparable with state-of-the-art approaches on datasets like CIFAR-10, SVHN, and ImageNet.

Cite

CITATION STYLE

APA

Vorabbi, L., Maltoni, D., & Santi, S. (2024). Optimizing Data Flow in Binary Neural Networks. Sensors, 24(15). https://doi.org/10.3390/s24154780

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free