Kairos: Building Cost-Efficient Machine Learning Inference Systems with Heterogeneous Cloud Resources

12Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Online inference is becoming a key service product for many businesses, deployed in cloud platforms to meet customer demands. Despite their revenue-generation capability, these services need to operate under tight Quality-of-Service (QoS) and cost budget constraints. This paper introduces KAIROS, a novel runtime framework that maximizes the query throughput while meeting QoS target and a cost budget. KAIROS designs and implements novel techniques to build a pool of heterogeneous compute hardware without online exploration overhead, and distribute inference queries optimally at runtime. Our evaluation using industry-grade machine learning (ML) models shows that KAIROS yields up to 2x the throughput of an optimal homogeneous solution, and outperforms state-of-the-art schemes by up to 70%, despite advantageous implementations of the competing schemes to ignore their exploration overhead.

Cite

CITATION STYLE

APA

Li, B., Samsi, S., Gadepally, V., & Tiwari, D. (2023). Kairos: Building Cost-Efficient Machine Learning Inference Systems with Heterogeneous Cloud Resources. In HPDC 2023 - Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing (pp. 3–16). Association for Computing Machinery, Inc. https://doi.org/10.1145/3588195.3592997

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free