Abstract
Large-scale systems are heavily-shared resource environments where a mix of applications run concurrently and compete for network and storage resources. It is essential to characterize the runtime behavior of these applications in order to provision system resources and understand the impact of resource contention on an application’s performance. In this paper, we study the use of zero- and low-overhead system logs and other system metric data for characterizing the runtime behavior of several applications. We present our preliminary work on estimating an application’s I/O demands by observing its file system usage patterns over multiple runs, and on estimating an application’s network utilization by observing link-layer error logs. We also present preliminary findings on using such information in making context-sensitive scheduling decisions that minimize potentially negative interactions between applications competing for shared resources. Our analysis is based on four months of system log data collected on one of the world’s largest supercomputing facilities, the Jaguar XT5 petaflop system at Oak Ridge National Laboratory.
Author supplied keywords
Cite
CITATION STYLE
Gunasekaran, R., Dillow, D., Shipman, G., Vuduc, R., & Chow, E. (2011). Characterizing Application Runtime Behavior from System Logs and Metrics. In Proceedings of the 1st International Workshop on Characterizing Applications for Heterogeneous Exascale Systems (CACHES). Tucson, AZ, USA. Retrieved from http://www.mcs.anl.gov/events/workshops/caches/2011/program.html
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.