Parallelization libraries

  • Bhattacharjee A
  • Contreras G
  • Martonosi M
N/ACitations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

Creating efficient, scalable dynamic parallel runtime systems for chip multiprocessors (CMPs) requires understanding the overheads that manifest at high core counts and small task sizes.In this article, we assess these overheads on Intel's Threading Building Blocks (TBB) and OpenMP. First, we use real hardware and simulations to detail various scheduler and synchronization overheads. We find that these can amount to 47% of TBB benchmark runtime and 80% of OpenMP benchmark runtime. Second, we propose load balancing techniques such as occupancy-based and criticality-guided task stealing, to boost performance.Overall, our study provides valuable insights for creating robust, scalable runtime libraries.

Cite

CITATION STYLE

APA

Bhattacharjee, A., Contreras, G., & Martonosi, M. (2011). Parallelization libraries. ACM Transactions on Architecture and Code Optimization, 8(1), 1–29. https://doi.org/10.1145/1952998.1953003

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free