Implications of Public Cloud Resource Heterogeneity for Inference Serving

Jashwant Raj Gunasekaran; Cyan Subhra Mishra; Prashanth Thinakaran; Mahmut Taylan Kandemir; Chita R. Das

Conference ProceedingsOPEN ACCESS

Implications of Public Cloud Resource Heterogeneity for Inference Serving

WOSC 2020 - Proceedings of the 2020 6th International Workshop on Serverless Computing, Part of Middleware 2020 (2020) 7-12

DOI: 10.1145/3429880.3430093

6Citations

10Readers

Get full text

Abstract

We are witnessing an increasing trend towards using Machine Learning (ML) based prediction systems, spanning across different application domains, including product recommendation systems, personal assistant devices, facial recognition, etc. These applications typically have diverse requirements in terms of accuracy and response latency, that can be satisfied by a myriad of ML models. However, the deployment cost of prediction serving primarily depends on the type of resources being procured, which by themselves are heterogeneous in terms of provisioning latencies and billing complexity. Thus, it is strenuous for an inference serving system to choose from this confounding array of resource types and model types to provide low-latency and cost-effective inferences. In this work we quantitatively characterize the cost, accuracy and latency implications of hosting ML inferences on different public cloud resource offerings. Our evaluation shows that, prior work does not solve the problem from both dimensions of model and resource heterogeneity. Hence, to holistically address this problem, we need to solve the issues that arise from combining both model and resource heterogeneity towards optimizing for application constraints. Towards this, we discuss the design implications of a self-managed inference serving system, which can optimize for application requirements based on public cloud resource characteristics.

Author supplied keywords

Cite

CITATION STYLE

APA

Gunasekaran, J. R., Mishra, C. S., Thinakaran, P., Kandemir, M. T., & Das, C. R. (2020). Implications of Public Cloud Resource Heterogeneity for Inference Serving. In WOSC 2020 - Proceedings of the 2020 6th International Workshop on Serverless Computing, Part of Middleware 2020 (pp. 7–12). Association for Computing Machinery, Inc. https://doi.org/10.1145/3429880.3430093

Implications of Public Cloud Resource Heterogeneity for Inference Serving

Abstract

Author supplied keywords

Cite

Register to see more suggestions