Recovery in the Calypso File System

Murthy Devarakonda; Bill Kish; Ajay Mohindra

Journal ArticleOPEN ACCESS

Recovery in the Calypso File System

ACM Transactions on Computer Systems (1996) 14(3) 287-310

DOI: 10.1145/233557.233560

18Citations

18Readers

Abstract

This article presents the design and implementation of the recovery scheme in Calypso. Calypso is a cluster-optimized, distributed file system for UNIX clusters. As in Sprite and AFS, Calypso servers are stateful and scale well to a large number of clients. The recovery scheme in Calypso is nondisruptive, meaning that open files remain open, client modified data are saved, and in-flight operations are properly handled across server recovery. The scheme uses distributed state among the clients to reconstruct the server state on a backup node if disks are multiported or on the rebooted server node. It guarantees data consistency during recovery and provides congestion control. Measurements show that the state reconstruction can be quite fast: for example, in a 32-node cluster, when an average node contains state for about 420 files, the reconstruction time is about 3.3 seconds. However, the time to update a file system after a failure can be a major factor in the overall recovery time, even when using journaling techniques.

Author supplied keywords

Cite

CITATION STYLE

APA

Devarakonda, M., Kish, B., & Mohindra, A. (1996). Recovery in the Calypso File System. ACM Transactions on Computer Systems, 14(3), 287–310. https://doi.org/10.1145/233557.233560

Recovery in the Calypso File System

Abstract

Author supplied keywords

Cite

Register to see more suggestions