A Markov Random Field Based Approach for Analyzing Supercomputer System Logs

Thomas Hacker; Rui Pais; Chunming Rong

Journal ArticleOPEN ACCESS

A Markov Random Field Based Approach for Analyzing Supercomputer System Logs

IEEE Transactions on Cloud Computing (2019) 7(3) 611-624

DOI: 10.1109/TCC.2017.2678473

3Citations

17Readers

Get full text

Abstract

High performance computing systems comprised of hundreds or thousands of computational nodes can generate a high volume of system log entries at a high data velocity. Analyzing these logs soon after they are generated is a significant challenge, due to the complexity of log messages, the speed at which they are produced, and the lack of a method to quickly map or categorize messages to meaningful sets. The impact of this problem is that it is not possible to comprehensively glean timely information from logs about the overall system or the health of individual nodes. In this paper, we address this problem through the development of a novel approach for system log analysis based on a markov random field (MRF) that can quickly categorize system log messages into multiple categories based on representative training examples provided by a user. We present a theoretical model of our approach, followed by an extensive evaluation of the accuracy and performance of the implementation of our model. We found that our MRF based approach can quickly categorize system log messages with a high degree of accuracy.

Author supplied keywords

Cite

CITATION STYLE

APA

Hacker, T., Pais, R., & Rong, C. (2019). A Markov Random Field Based Approach for Analyzing Supercomputer System Logs. IEEE Transactions on Cloud Computing, 7(3), 611–624. https://doi.org/10.1109/TCC.2017.2678473

A Markov Random Field Based Approach for Analyzing Supercomputer System Logs

Abstract

Author supplied keywords

Cite

Register to see more suggestions