Abstract
Collaborative inference is the current state-of-The-Art solution for mobile-server neural network inference offloading. However, we find that existing collaborative inference solutions only focus on partitioning the DNN computation, which is only a small part of achieving an efficient DNN offloading system. What ultimately determines the performance of DNN offloading is how the execution system utilizes the characteristics of the given DNN offloading task on the mobile, network, and server resources of the offloading environment. To this end, we design CoActo, a DNN execution system built from the ground up for mobile-server inference offloading. Our key design philosophy is Coactive Inference Offloading, which is a new, improved concept of DNN offloading that adds two properties, 1) fine-grained expression of DNNs and 2) concurrency of runtime resources, to existing collaborative inference. In CoActo, system components go beyond simple model splitting of existing approaches and operate more proactively to achieve the coactive execution of inference workloads. CoActo dynamically schedules concurrent interleaving of the mobile, server, and network operations to actively increase resource utilization, enabling lower end-To-end latency. We implement CoActo for various mobile devices and server environments and evaluate our system with distinct environment settings and DNN models. The experimental results show that our system achieves up to 2.1 times speed-up compared to the state-of-The-Art collaborative inference solutions.
Author supplied keywords
Cite
CITATION STYLE
Bin, K., Park, J., Park, C., Kim, S., & Lee, K. (2024). CoActo: CoActive Neural Network Inference Offloading with Fine-grained and Concurrent Execution. In MOBISYS 2024 - Proceedings of the 2024 22nd Annual International Conference on Mobile Systems, Applications and Services (pp. 412–424). Association for Computing Machinery, Inc. https://doi.org/10.1145/3643832.3661885
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.