Design and analysis of algorithms for shared-memory multiprocessors

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Shared-memory multiprocessors feature parallelism and a steep cache hierarchy, two salient characteristics that distinguish them from the commodity processors of ten years ago. Both of these characteristics can be exploited effectively using the same general strategy: divide-and-conquer recursion. This talk overviews the Cilk multithreaded programming language being developed in the MIT Laboratory for Computer Science, which allows a programmer to exploit parallelism through divide-and-conquer. In addition, I show how divide-and-conquer allows caches to be used effectively. In the first part of my talk, I introduce the Cilk programming language. Cilk minimally extends the C programming language to allow interactions among computational threads to be specified in a simple and highlevel fashion. Cilk’s provably efficient runtime system dynamically maps a user’s program onto available physical resources, freeing the programmer from concerns of communication protocols and load balancing. In addition, Cilk provides an abstract performance model that a programmer can use to predict the multiprocessor performance of his application from its execution on a single processor. Not only do Cilk programs scale up to run efficiently on multiple processors, but unlike existing parallel-programming environments, such as MPI and HPF, Cilk programs “scale down”: the efficiency of a Cilk program on one processor rivals that of a comparable C program. I illustrate Cilk programming through the example of divide-and-conquer matrix multiplication. The second part of my talk presents a strategy for designing algorithms to exploit multiple levels of caching effectively. Unlike previous algorithms, these algorithms are “cache oblivious”: no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve optimality. Nevertheless, I show that cache-oblivious algorithms can be developed that use an optimal amount of work and move data optimally among multiple levels of cache. Problems that can be solved efficiently by cache-oblivious algorithms include matrix multiplication, FFT, sorting, and matrix transpose, all of which are solved in a divide-and-conquer fashion. Together, these two technologies provide a foundation for the design and analysis of efficient algorithms for shared-memory multiprocessors.

Cite

CITATION STYLE

APA

Leiserson, C. E. (1999). Design and analysis of algorithms for shared-memory multiprocessors. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1663). Springer Verlag. https://doi.org/10.1007/3-540-48447-7_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free