Take it to the limit: Peak prediction-driven resource overcommitment in datacenters

38Citations
Citations of this article
37Readers
Mendeley users who have this article in their library.
Get full text

Abstract

To increase utilization, datacenter schedulers often overcommit resources where the sum of resources allocated to the tasks on a machine exceeds its physical capacity. Setting the right level of overcommitment is a challenging problem: low overcommitment leads to wasted resources, while high overcommitment leads to task performance degradation. In this paper, we take a first principles approach to designing and evaluating overcommit policies by asking a basic question: assuming complete knowledge of each task's future resource usage, what is the safest overcommit policy that yields the highest utilization? We call this policy the peak oracle. We then devise practical overcommit policies that mimic this peak oracle by predicting future machine resource usage. We simulate our overcommit policies using the recently-released Google cluster trace, and show that they result in higher utilization and less overcommit errors than policies based on per-task allocations. We also deploy these policies to machines inside Google's datacenters serving its internal production workload. We show that our overcommit policies increase these machines' usable CPU capacity by 10-16% compared to no overcommitment.

References Powered by Scopus

Apache hadoop YARN: Yet another resource negotiator

1504Citations
N/AReaders
Get full text

Large-scale cluster management at Google with Borg

1031Citations
N/AReaders
Get full text

Heterogeneity and dynamicity of clouds at scale: Google trace analysis

866Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Owl: Performance-Aware Scheduling for Resource-Efficient Function-as-a-Service Cloud

28Citations
N/AReaders
Get full text

Golgi: Performance-Aware, Resource-Efficient Function Scheduling for Serverless Computing

15Citations
N/AReaders
Get full text

CarbonScaler: Leveraging Cloud Workload Elasticity for Optimizing Carbon-Efficiency

14Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Bashir, N., Deng, N., Rzadca, K., Irwin, D., Kodak, S., & Jnagal, R. (2021). Take it to the limit: Peak prediction-driven resource overcommitment in datacenters. In EuroSys 2021 - Proceedings of the 16th European Conference on Computer Systems (pp. 556–573). Association for Computing Machinery, Inc. https://doi.org/10.1145/3447786.3456259

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 21

91%

Professor / Associate Prof. 1

4%

Researcher 1

4%

Readers' Discipline

Tooltip

Computer Science 23

92%

Neuroscience 1

4%

Engineering 1

4%

Save time finding and organizing research with Mendeley

Sign up for free