Profiling and optimizing deep learning inference on mobile GPUs

Shiqi Jiang; Lihao Ran; Ting Cao; Yusen Xu; Yunxin Liu

Conference ProceedingsOPEN ACCESS

Profiling and optimizing deep learning inference on mobile GPUs

APSys 2020 - Proceedings of the 2020 ACM SIGOPS Asia-Pacific Workshop on Systems (2020) 75-81

DOI: 10.1145/3409963.3410493

13Citations

14Readers

Get full text

Abstract

Mobile GPU, as the ubiquitous computing hardware on almost every smartphone, is being exploited for the deep learning inference. In this paper, we present our measurements on the inference performance with mobile GPUs. Our observations suggest that mobile GPUs are underutilized. We study the inefficient issue in depth and find that one of root causes is the improper partition of compute workload. To solve this, we propose a heuristics-based workload partitioning approach, considering both performance and overheads on mobile devices. Evaluation results show that our approach reduces the inference latency by up to 32.8% on various devices and neural networks.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Jiang, S., Ran, L., Cao, T., Xu, Y., & Liu, Y. (2020). Profiling and optimizing deep learning inference on mobile GPUs. In APSys 2020 - Proceedings of the 2020 ACM SIGOPS Asia-Pacific Workshop on Systems (pp. 75–81). Association for Computing Machinery. https://doi.org/10.1145/3409963.3410493

Readers' Seniority

PhD / Post grad / Masters / Doc 5

71%

Professor / Associate Prof. 1

14%

Researcher 1

14%

Readers' Discipline

Computer Science 6

86%

Engineering 1

14%

Profiling and optimizing deep learning inference on mobile GPUs

Abstract

Author supplied keywords

References Powered by Scopus

Deep residual learning for image recognition

Densely connected convolutional networks

Rethinking the Inception Architecture for Computer Vision

Cited by Powered by Scopus

Distributed Artificial Intelligence Empowered by End-Edge-Cloud Computing: A Survey

Flexible high-resolution object detection on edge devices with tunable latency

CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline