By “cluster system software,” we mean the software that turns a collection of individual machines into a powerful resource for a wide variety of applications. In this talk we will examine one loosely integrated collection of open-source cluster system software that includes an infrastructure for building component-based systems management tools, a collection of components based on this infrastructure that has been used for the last year to manage a medium-sized cluster, a scalable process-management component in this collection that provides for both batch and interactive use, and an MPI-2 implementation together with debugging and performance analysis tools that help in developing advanced applications. The component infrastructure has been developed in the context of the Scalable Systems Software SciDAC project, where a number of system management tools, developed by various groups, have been tied together by a common communication library. The flexible architecture of this library allows systems managers to design and implement new systems components and even new communication protocols and integrates them into a collection of existing components. One of the components that has been integrated into this system is the MPD process manager; we will describe its capabilities. It, in turn, supports the process management interface used by MPICH-2, a full-featured MPI-2 implementation, for scalable startup, dynamic process functionality in MPI-2, and interactive debugging. This combination allows significant components of the systems software stack to be written in MPI for increased performance and scalability.
CITATION STYLE
Lusk, E. (2004). An open cluster system software stack. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3241, p. 9). Springer Verlag. https://doi.org/10.1007/978-3-540-30218-6_6
Mendeley helps you to discover research relevant for your work.