Design and implementation of an efficient thread partitioning algorithm

3Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The development of fine-grain multi-threaded program ex-ecution models has created an interesting challenge: how to partition a program into threads that can exploit machine parallelism, achieve latency tolerance, and maintain reasonable locality of reference? A suc-cessful algorithm must produce a thread partition that best utilizes mul-tiple execution units on a single processing node and handles long and unpredictable latencies. In this paper, we introduce a new thread partitioning algorithm that can meet the above challenge for a range of machine architecture models. A quantitative aFFInity heuristic is introduced to guide the placement of operations into threads. This heuristic addresses the trade-off between exploiting parallelism and preserving locality. The algorithm is surpris-ingly simple due to the use of a time-ordered event list to account for the multiple execution unit activities. We have implemented the proposed al-gorithm and our experiments, performed on a wide range of examples, have demonstrated its eFFIciency and effectiveness.

Cite

CITATION STYLE

APA

Amaral, J. N., Gao, G., Kocalar, E. D., O’Neill, P., & Tang, X. (2000). Design and implementation of an efficient thread partitioning algorithm. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1940, pp. 252–259). Springer Verlag. https://doi.org/10.1007/3-540-39999-2_22

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free