Cyber-human and cyber-physical systems have tight end-to-end latency bounds, typically on the order of a few tens of milliseconds. In contrast, cloud-based large-language models (LLMs) have end-to-end latencies that are two to three orders of magnitude larger. This paper shows how to bridge this large gap by using LLMs as offline compilers for creating task-specific code that avoids LLM accesses. We provide three case studies as proofs of concept, and discuss the challenges in generalizing this technique to broader uses.
CITATION STYLE
Dong, Q., Chen, X., & Satyanarayanan, M. (2024). Creating Edge AI from Cloud-based LLMs. In HOTMOBILE 2024 - Proceedings of the 2024 25th International Workshop on Mobile Computing Systems and Applications (pp. 8–13). Association for Computing Machinery, Inc. https://doi.org/10.1145/3638550.3641126
Mendeley helps you to discover research relevant for your work.