The nano-threads programming model was proposed to effectively integrate multiprogramming on shared-memory multiprocessors, with the exploitation of fine-grain parallelism from standard applications. A prerequisite for the applicability of the nano-threads programming model is the ability of the runtime environment to manage parallelism at any level of granularity with minimal overheads. In this paper, we introduce runtime techniques for efficient memory management and user-level scheduling in an experimental runtime system designed to support the nano-threads programming model. We evaluate the exploitation of processor affinity for the management of nano-thread contexts, and the use of hierarchical queues to implement user-level scheduling strategies for applications with inherent multilevel parallelism. The proposed mechanisms attempt to obtain maximum benefits from data locality on cache-coherent NUMA multip'rocessors. Through the use of synthetic benchmarks, we find that our mechanism for memory management in the runtime system reduces overheads by 52% on average, compared to other known mechanisms. The use of hierarchical queues gives significant performance improvements between 17% and 40%, compared to scheduling strategies that use local queues.
CITATION STYLE
Nikolopoulos, D. S., Polychronopoulos, E. D., & Papatheodorou, T. S. (1998). Efficient runtime thread management for the nano-threads programming model. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1388, pp. 183–194). Springer Verlag. https://doi.org/10.1007/3-540-64359-1_688
Mendeley helps you to discover research relevant for your work.