Creating Edge AI from Cloud-based LLMs

4Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

Abstract

Cyber-human and cyber-physical systems have tight end-to-end latency bounds, typically on the order of a few tens of milliseconds. In contrast, cloud-based large-language models (LLMs) have end-to-end latencies that are two to three orders of magnitude larger. This paper shows how to bridge this large gap by using LLMs as offline compilers for creating task-specific code that avoids LLM accesses. We provide three case studies as proofs of concept, and discuss the challenges in generalizing this technique to broader uses.

Cite

CITATION STYLE

APA

Dong, Q., Chen, X., & Satyanarayanan, M. (2024). Creating Edge AI from Cloud-based LLMs. In HOTMOBILE 2024 - Proceedings of the 2024 25th International Workshop on Mobile Computing Systems and Applications (pp. 8–13). Association for Computing Machinery, Inc. https://doi.org/10.1145/3638550.3641126

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free