ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models

11Citations
Citations of this article
32Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

With new accelerator hardware for Deep Neural Networks (DNNs), the computing power for Artificial Intelligence (AI) applications has increased rapidly. However, as DNN algorithms become more complex and optimized for specific applications, latency requirements remain challenging, and it is critical to find the optimal points in the design space. To decouple the architectural search from the target hardware, we propose a time estimation framework that allows for modeling the inference latency of DNNs on hardware accelerators based on mapping and layer-wise estimation models. The proposed methodology extracts a set of models from micro-kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation. We test the mixed models on the ZCU102 SoC board with Xilinx Deep Neural Network Development Kit (DNNDK) and Intel Neural Compute Stick 2 (NCS2) on a set of 12 state-of-the-art neural networks. It shows an average estimation error of 3.47% for the DNNDK and 7.44% for the NCS2, outperforming the statistical and analytical layer models for almost all selected networks. For a randomly selected subset of 34 networks of the NASBench dataset, the mixed model reaches fidelity of 0.988 in Spearman's $\rho $ rank correlation coefficient metric.

References Powered by Scopus

MobileNetV2: Inverted Residuals and Linear Bottlenecks

19216Citations
N/AReaders
Get full text

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

6727Citations
N/AReaders
Get full text

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

3463Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Lightweight Fire Detection System Using Hybrid Edge-Cloud Computing

4Citations
N/AReaders
Get full text

SLAPP: Subgraph-level attention-based performance prediction for deep learning models

2Citations
N/AReaders
Get full text

End-to-End Deep Policy Feedback-Based Reinforcement Learning Method for Quantization in DNNs

2Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Wess, M., Ivanov, M., Unger, C., Nookala, A., Wendt, A., & Jantsch, A. (2021). ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models. IEEE Access, 9, 3545–3556. https://doi.org/10.1109/ACCESS.2020.3047259

Readers over time

‘21‘22‘23‘24‘250481216

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 11

92%

Lecturer / Post doc 1

8%

Readers' Discipline

Tooltip

Computer Science 4

40%

Engineering 3

30%

Business, Management and Accounting 2

20%

Decision Sciences 1

10%

Save time finding and organizing research with Mendeley

Sign up for free
0