Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors

880Citations
Citations of this article
268Readers
Mendeley users who have this article in their library.

Abstract

Busy-wait techniques are heavily used for mutual exclusion andbarrier synchronization in shared-memory parallel programs.Unfortunately, typical implementations of busy-waiting tend to producelarge amounts of memory and interconnect contention, introducingperformance bottlenecks that become markedly more pronounced asapplications scale. We argue that this problem is not fundamental, andthat one can in fact construct busy-wait synchronization algorithms thatinduce no memory or interconnect contention. The key to these algorithmsis for every processor to spin on separate locally-accessible flag variables,and for some other processor to terminate the spin with a single remotewrite operation at an appropriate time. Flag variables may be locally-accessible as a result of coherent caching, or by virtue ofallocation in the local portion of physically distributed sharedmemory. We present a new scalable algorithm for spin locks that generates 01991 remote references per lockacquisition, independent of the number of processors attempting toacquire the lock. Our algorithm provides reasonable latency in theabsence of contention, requires only a constant amount of space perlock, and requires no hardware support other than a swap-with-memoryinstruction. We also present a new scalable barrier algorithm thatgenerates 0(1) remote references perprocessor reaching the barrier, and observe that two previously-knownbarriers can likewise be cast in a form that spins only onlocally-accessible flag variables. None of these barrier algorithmsrequires hardware support beyond the usual atomicity of memory reads andwrites. We compare the performance of our scalable algorithms with othersoftware approaches to busy-wait synchronization on both a SequentSymmetry and a BBN Butterfly. Our principal conclusion is that contention due to synchronization need not be a problemin large-scale shared-memory multiprocessors. Theexistence of scalable algorithms greatly weakens the case for costlyspecial-purpose hardware support for synchronization, and provides acase against so-called “dance hall” architectures, in whichshared memory locations are equally far from all processors. © 1991, ACM. All rights reserved.

Cite

CITATION STYLE

APA

Mellor-Crummey, J. M., & Scott, M. L. (1991). Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors. ACM Transactions on Computer Systems (TOCS), 9(1), 21–65. https://doi.org/10.1145/103727.103729

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free