Abstract
This paper is the ?rst extensive performance study of a recently proposed parallel programming model, called Concurrent Collections (CnC). In CnC, the programmer expresses her computation in terms of application-speci?c operations, partially-ordered by semantic scheduling constraints. The CnC model is well-suited to expressing asynchronous-parallel algorithms, so we evaluate CnC using two dense linear algebra algorithms in this style for execution on state-of-the-art multicore systems: (i) a recently proposed asynchronous- parallel Cholesky factorization algorithm, (ii) a novel and non-trivial "higher-level"partly-asynchronous generalized eigensolver for dense symmetric matrices. Given a well-tuned sequential BLAS, our implementations match or exceed competing multithreadedvendor-tuned codes by up to 2.6x. Our evaluation compares with alternative models, including ScaLAPACK with a shared memory MPI, OpenMP, Cilk++, and PLASMA 2.0, on Intel Harpertown, Nehalem, and AMD Barcelona systems. Looking forward, we identify new opportunities to improve the CnC language and run-time scheduling and execution. © 2010 IEEE.
Cite
CITATION STYLE
Chandramowlishwaran, A., Knobe, K., & Vuduc, R. (2010). Performance evaluation of concurrent collections on high-performance multicore computing systems. In Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010. https://doi.org/10.1109/IPDPS.2010.5470404
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.