Efficient work stealing for fine grained parallelism

Karl Filip Faxén

Conference Proceedings

Efficient work stealing for fine grained parallelism

Faxén K

Proceedings of the International Conference on Parallel Processing (2010) 313-322

DOI: 10.1109/ICPP.2010.39

35Citations

20Readers

Get full text

Abstract

This paper deals with improving the performance of fine grain task parallelism. It is often either cumbersome or impossible to increase the grain size of such programs. Increasing core counts exacerbates the problem; a program that appears coarse-grained on eight cores may well look a lot more finegrained on sixty four. In this paper we present the direct task stack, a novel work stealing algorithm with unusually low overheads, both for creating tasks and for stealing. We compare the performance of our scheduler to Cilk++, the icc implementation of OpenMP 3.0 and the Intel TBB library on an eight core, dual socket Opteron machine. We also analyze the reasons why our techniques achieve consistent speed ups over the other systems ranging from 2-3x on many fine grained workloads to over 50 in extreme cases and show quantitatively how each of the techniques we use contribute to the improved performance. © 2010 IEEE.

Cite

CITATION STYLE

APA

Faxén, K. F. (2010). Efficient work stealing for fine grained parallelism. In Proceedings of the International Conference on Parallel Processing (pp. 313–322). https://doi.org/10.1109/ICPP.2010.39

Efficient work stealing for fine grained parallelism

Abstract

Cite

Register to see more suggestions