Abstract
Duplication can be a powerful strategy for overcoming stragglers in cloud services, but is often used conservatively because of the risk of overloading the system. We call for making duplication a first-class concept in cloud systems, and make two contributions in this regard. First, we present duplicate-aware scheduling or DAS, an aggressive duplication policy that duplicates every job, but keeps the system safe by providing suitable support (prioritization and purging) at multiple layers of the cloud system. Second, we present the D-Stage abstraction, which supports DAS and other duplication policies across diverse layers of a cloud system (e.g., network, storage, etc.). The D-Stage abstraction decouples the duplication policy from the mechanism, and facilitates working with legacy layers of a system. Using this abstraction, we evaluate the benefits of DAS for two data parallel applications (HDFS, an in-memory workload generator) and a network function (Snort-based IDS cluster). Our experiments on the public cloud and Emulab show that DAS is safe to use, and the tail latency improvement holds across a wide range of workloads.
Author supplied keywords
Cite
CITATION STYLE
Bashir, H. M., Faisal, A. B., Jamshed, M. A., Vondras, P., Iftikhar, A. M., Qazi, I. A., & Dogar, F. R. (2019). Reducing tail latency using duplication: A multi-layered approach. In CoNEXT 2019 - Proceedings of the 15th International Conference on Emerging Networking Experiments and Technologies (pp. 246–259). Association for Computing Machinery, Inc. https://doi.org/10.1145/3359989.3365432
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.