Sign up & Download
Sign in

Performance evaluation of concurrent collections on high-performance multicore computing systems

by Aparna Chandramowlishwaran, Kathleen Knobe, Richard Vuduc
Matrix (2010)

Abstract

This paper is the first extensive performance study of a recently proposed parallel programming model, called Concurrent Collections (CnC). In CnC, the programmer expresses her computation in terms of application-specific operations, partially-ordered by semantic scheduling constraints. The CnC model is well-suited to expressing asynchronous-parallel algorithms, so we evaluate CnC using two dense linear algebra algorithms in this style for execution on state-of-the-art multicore systems: (i) a recently proposed asynchronous-parallel Cholesky factorization algorithm, (ii) a novel and non-trivial higher-level partly-asynchronous generalized eigensolver for dense symmetric matrices. Given a well-tuned sequential BLAS, our implementations match or exceed competing multithreaded vendor-tuned codes by up to 2.6. Our evaluation compares with alternative models, including ScaLAPACK with a shared memory MPI, OpenMP, Cilk++, and PLASMA 2.0, on Intel Harpertown, Nehalem, and AMD Barcelona systems. Looking forward, we identify new opportunities to improve the CnC language and runtime scheduling and execution.

Cite this document (BETA)

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

10 Readers on Mendeley
by Discipline
 
 
by Academic Status
 
30% Ph.D. Student
 
20% Post Doc
 
20% Researcher (at an Academic Institution)
by Country
 
20% Japan
 
20% United States
 
20% China

Groups

HPC Garage