Redundancy Does Not Imply Fault Tolerance

Aishwarya Ganesan; Ramnatthan Alagappan; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau

Journal ArticleOPEN ACCESS

Redundancy Does Not Imply Fault Tolerance

Ganesan A
Alagappan R
Arpaci-Dusseau A
et al.

ACM Transactions on Storage (2017) 13(3) 1-33

DOI: 10.1145/3125497

N/ACitations

20Readers

Abstract

We analyze how modern distributed storage systems behave in the presence of file-system faults such as data corruption and read and write errors. We characterize eight popular distributed storage systems and uncover numerous problems related to file-system fault tolerance. We find that modern distributed systems do not consistently use redundancy to recover from file-system faults: a single file-system fault can cause catastrophic outcomes such as data loss, corruption, and unavailability. We also find that the above outcomes arise due to fundamental problems in file-system fault handling that are common across many systems. Our results have implications for the design of next-generation fault-tolerant distributed and cloud storage systems.

Cite

CITATION STYLE

APA

Ganesan, A., Alagappan, R., Arpaci-Dusseau, A. C., & Arpaci-Dusseau, R. H. (2017). Redundancy Does Not Imply Fault Tolerance. ACM Transactions on Storage, 13(3), 1–33. https://doi.org/10.1145/3125497

Redundancy Does Not Imply Fault Tolerance

Abstract

Cite

Register to see more suggestions