Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs

10Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

GPUs are increasingly utilized for running DNN tasks on emerging mobile edge devices. Beyond accelerating single task inference, their value is also particularly apparent in efficiently executing multiple DNN tasks, which often have strict latency requirements in applications. Preemption is the main technology to ensure multitasking timeliness, but mobile edges primarily offer two priorities for task queues, and existing methods thus achieve only coarse-grained preemption by categorizing DNNs into real-Time and best-effort, permitting a real-Time task to preempt best-effort ones. However, the efficacy diminishes significantly when other real-Time tasks run concurrently, but this is already common in mobile edge applications. Due to different hardware characteristics, solutions from other platforms are unsuitable. For instance, GPUs on traditional mobile devices primarily assist CPU processing and lack special preemption support, mainly following FIFO in GPU scheduling. Clouds handle concurrent task execution, but focus on allocating one or more GPUs per complex model, whereas on mobile edges, DNNs mainly vie for one GPU. This paper introduces Pantheon, designed to offer fine-grained preemption, enabling real-Time tasks to preempt each other and best-effort tasks. Our key observation is that the two-Tier GPU stream priorities, while underexplored, are sufficient. Efficient preemption can be realized through software design by innovative scheduling and novel exploitation of the nested redundancy principle for DNN models. Evaluation on a diverse set of DNNs shows substantial improvements in deadline miss rate and accuracy of Pantheon over state-of-The-Art methods.

Cite

CITATION STYLE

APA

Han, L., Zhou, Z., & Li, Z. (2024). Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs. In MOBISYS 2024 - Proceedings of the 2024 22nd Annual International Conference on Mobile Systems, Applications and Services (pp. 465–478). Association for Computing Machinery, Inc. https://doi.org/10.1145/3643832.3661878

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free