DC-Prophet: Predicting Catastrophic Machine Failures in DataCenters

11Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

When will a server fail catastrophically in an industrial datacenter? Is it possible to forecast these failures so preventive actions can be taken to increase the reliability of a datacenter? To answer these questions, we have studied what are probably the largest, publicly available datacenter traces, containing more than 104 million events from 12,500 machines. Among these samples, we observe and categorize three types of machine failures, all of which are catastrophic and may lead to information loss, or even worse, reliability degradation of a datacenter. We further propose a two-stage framework—DC-Prophet (DC-Prophet stands for DataCenter-Prophet.)—based on One-Class Support Vector Machine and Random Forest. DC-Prophet extracts surprising patterns and accurately predicts the next failure of a machine. Experimental results show that DC-Prophet achieves an AUC of 0.93 in predicting the next machine failure, and a F3-score (The ideal value of F3-score is 1, indicating perfect predictions. Also, the intuition behind F3-score is to value “Recall” about three times more than “Precision” [12].) of 0.88 (out of 1). On average, DC-Prophet outperforms other classical machine learning methods by 39.45% in F3-score.

Cite

CITATION STYLE

APA

Lee, Y. L., Juan, D. C., Tseng, X. A., Chen, Y. T., & Chang, S. C. (2017). DC-Prophet: Predicting Catastrophic Machine Failures in DataCenters. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10536 LNAI, pp. 64–76). Springer Verlag. https://doi.org/10.1007/978-3-319-71273-4_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free