A Markov Random Field Based Approach for Analyzing Supercomputer System Logs

3Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.
Get full text

Abstract

High performance computing systems comprised of hundreds or thousands of computational nodes can generate a high volume of system log entries at a high data velocity. Analyzing these logs soon after they are generated is a significant challenge, due to the complexity of log messages, the speed at which they are produced, and the lack of a method to quickly map or categorize messages to meaningful sets. The impact of this problem is that it is not possible to comprehensively glean timely information from logs about the overall system or the health of individual nodes. In this paper, we address this problem through the development of a novel approach for system log analysis based on a markov random field (MRF) that can quickly categorize system log messages into multiple categories based on representative training examples provided by a user. We present a theoretical model of our approach, followed by an extensive evaluation of the accuracy and performance of the implementation of our model. We found that our MRF based approach can quickly categorize system log messages with a high degree of accuracy.

Cite

CITATION STYLE

APA

Hacker, T., Pais, R., & Rong, C. (2019). A Markov Random Field Based Approach for Analyzing Supercomputer System Logs. IEEE Transactions on Cloud Computing, 7(3), 611–624. https://doi.org/10.1109/TCC.2017.2678473

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free