Few-to-Many: Incremental parallelism for reducing tail latency in interactive services

Md E. Haque; Sameh Elnikety; Yong Hun Eom; Ricardo Bianchin; Yuxiong He; Kathryn S. McKinley

Conference Proceedings

Few-to-Many: Incremental parallelism for reducing tail latency in interactive services

ACM SIGPLAN Notices (2015) 50(4) 161-175

DOI: 10.1145/2694344.2694384

37Citations

83Readers

Get full text

Abstract

Interactive services, such as Web search, recommendations, games, and finance, must respond quickly to satisfy customers. Achieving this goal requires optimizing tail (e.g., 99th+ percentile) latency. Although every server is multicore, parallelizing individual requests to reduce tail latency is challenging because (1) service demand is unknown when requests arrive; (2) blindly parallelizing all requests quickly oversubscribes hardware resources; and (3) parallelizing the numerous short requests will not improve tail latency. This paper introduces Few-to-Many (FM) incremental parallelization, which dynamically increases parallelism to reduce tail latency. FM uses request service demand profiles and hardware parallelism in an offline phase to compute a policy, represented as an interval table, which specifies when and how much software parallelism to add. At runtime, FM adds parallelism as specified by the interval table indexed by dynamic system load and request execution time progress. The longer a request executes, the more parallelism FM adds. We evaluate FM in Lucene, an open-source enterprise search engine, and in Bing, a commercial Web search engine. FM improves the 99th percentile response time up to 32% in Lucene and up to 26% in Bing, compared to prior state-of-the-art parallelization. Compared to running requests sequentially in Bing, FM improves tail latency by a factor of two. These results illustrate that incremental parallelism is a powerful tool for reducing tail latency.

Author supplied keywords

Cite

CITATION STYLE

APA

Haque, M. E., Elnikety, S., Eom, Y. H., Bianchin, R., He, Y., & McKinley, K. S. (2015). Few-to-Many: Incremental parallelism for reducing tail latency in interactive services. In ACM SIGPLAN Notices (Vol. 50, pp. 161–175). Association for Computing Machinery. https://doi.org/10.1145/2694344.2694384

Few-to-Many: Incremental parallelism for reducing tail latency in interactive services

Abstract

Author supplied keywords

Cite

Register to see more suggestions