An implementation of the codelet model

42Citations
Citations of this article
24Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Chip architectures are shifting from few, faster, functionally heavy cores to abundant, slower, simpler cores to address pressing physical limitations such as energy consumption and heat expenditure. As architectural trends continue to fluctuate, we propose a novel program execution model, the Codelet model, which is designed for new systems tasked with efficiently managing varying resources. The Codelet model is a fine-grained dataflow inspired model extended to address the cumbersome resources available in new architectures. In the following, we define the Codelet execution model as well as provide an implementation named DARTS. Utilizing DARTS and two predominant kernels, matrix multiplication and the Graph 500's breadth first search, we explore the validity of fine-grain execution as a promising and viable execution model for future and current architectures. We show that our runtime is on par or performs better than AMD's highly-optimized parallel library for matrix multication, outperforming it on average by 1.40x with a speedup up to 4x. Our implementation of the parallel BFS outperforms Graph 500's reference implementation (with or without dynamic scheduling) on average by 1.50x with a speed up of up to 2.38x. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Suettlerlein, J., Zuckerman, S., & Gao, G. R. (2013). An implementation of the codelet model. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8097 LNCS, pp. 633–644). https://doi.org/10.1007/978-3-642-40047-6_63

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free