Characterizing Application Runtime Behavior from System Logs and Metrics

Raghul Gunasekaran; David Dillow; Galen Shipman; Richard Vuduc; Edmond Chow

Conference Proceedings

Characterizing Application Runtime Behavior from System Logs and Metrics

Gunasekaran R
Dillow D
Shipman G
et al.

Proceedings of the 1st International Workshop on Characterizing Applications for Heterogeneous Exascale Systems (CACHES) (2011)

N/ACitations

7Readers

Abstract

Large-scale systems are heavily-shared resource environments where a mix of applications run concurrently and compete for network and storage resources. It is essential to characterize the runtime behavior of these applications in order to provision system resources and understand the impact of resource contention on an application’s performance. In this paper, we study the use of zero- and low-overhead system logs and other system metric data for characterizing the runtime behavior of several applications. We present our preliminary work on estimating an application’s I/O demands by observing its file system usage patterns over multiple runs, and on estimating an application’s network utilization by observing link-layer error logs. We also present preliminary findings on using such information in making context-sensitive scheduling decisions that minimize potentially negative interactions between applications competing for shared resources. Our analysis is based on four months of system log data collected on one of the world’s largest supercomputing facilities, the Jaguar XT5 petaflop system at Oak Ridge National Laboratory.

Author supplied keywords

Cite

CITATION STYLE

APA

Gunasekaran, R., Dillow, D., Shipman, G., Vuduc, R., & Chow, E. (2011). Characterizing Application Runtime Behavior from System Logs and Metrics. In Proceedings of the 1st International Workshop on Characterizing Applications for Heterogeneous Exascale Systems (CACHES). Tucson, AZ, USA. Retrieved from http://www.mcs.anl.gov/events/workshops/caches/2011/program.html

Characterizing Application Runtime Behavior from System Logs and Metrics

Abstract

Author supplied keywords

Cite

Register to see more suggestions