Job management requirements for NAS parallel systems and clusters

3Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A job management system is a critical component of a production supercomputing environment, permitting oversubscribed resources to be shared fairly and efficiently. Job management systems that were originally designed for traditional vector supercomputers are not appropriate for the distributed-memory parallel supercomputers that are becoming increasingly important in the high performance computing industry. Newer job management systems offer new functionality but do not solve fundamental problems. We address some of the main issues in resource allocation and job scheduling we have encountered on two parallel computers - a 160-node IBM SP2 and a cluster of 20 high performance workstations located at the Numerical Aerodynamic Simulation facility. We describe the requirements for resource allocation and job management that are necessary to provide a production supercomputing environment on these machines, prioritizing according to difficulty and importance, and advocating a return to fundamental issues.

Cite

CITATION STYLE

APA

Saphir, W., Tanner, L. A., & Traversat, B. (1995). Job management requirements for NAS parallel systems and clusters. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 949, pp. 319–336). Springer Verlag. https://doi.org/10.1007/3-540-60153-8_37

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free