Invariants based failure diagnosis in distributed computing systems

20Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper presents an instance based approach to diagnosing failures in computing systems. Owing to the fact that a large portion of occurred failures are repeated ones, our method takes advantage of past experiences by storing historical failures in a database and retrieving similar instances in the occurrence of failure. We extract the system 'invariants' by modeling consistent dependencies between system attributes during the operation, and construct a network graph based on the learned invariants. When a failure happens, the status of invariants network, i.e., whether each invariant link is broken or not, provides a view of failure characteristics. We use a high dimensional binary vector to store those failure evidences, and develop a novel algorithm to efficiently retrieve failure signatures from the database. Experimental results in a web based system have demonstrated the effectiveness of our method in diagnosing the injected failures. © 2010 IEEE.

Cite

CITATION STYLE

APA

Chen, H., Jiang, G., Yoshihira, K., & Saxena, A. (2010). Invariants based failure diagnosis in distributed computing systems. In Proceedings of the IEEE Symposium on Reliable Distributed Systems (pp. 160–166). https://doi.org/10.1109/SRDS.2010.26

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free