Hound

  • Zheng P
  • Lee B
N/ACitations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Stragglers are exceptionally slow tasks within a job that delay its completion. Stragglers, which are uncommon within a single job, are pervasive in datacenters with many jobs. We present Hound, a statistical machine learning framework that infers the causes of stragglers from traces of datacenter-scale jobs. Hound is designed to achieve several objectives: datacenter-scale diagnosis, unbiased inference, interpretable models, and computational efficiency. We demonstrate Hound's capabilities for a production trace from Google's warehouse-scale datacenters and two Spark traces from Amazon EC2 clusters.

Cite

CITATION STYLE

APA

Zheng, P., & Lee, B. C. (2019). Hound. ACM SIGMETRICS Performance Evaluation Review, 46(1), 59–61. https://doi.org/10.1145/3292040.3219641

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free