Fault tolerance is a vital element in cloud computing to achieve high performance. In cloud computing fault tolerance particularly cluster management, network discovery and consistent system master node replication, consensus and coordination play a major role. This paper provides a comprehensive overview of the topic of fault tolerance, i.e., the problem of consensus in cloud computing; highlighting the important concepts along with the explanation of Byzantine Agreement problem and consensus problems in multi-agent systems. There are multiple algorithms/protocols like RAFT and PAXOS available to approach this problem. We present generalized consensus implementation by solving consensus for dual failure nodes. We also describe Apache Zookeeper as our coordination service to obtain consensus in a distributed system.
CITATION STYLE
Shidaganti, G., Pravakar, R., Shirisha, M., & Samyuktha, H. R. (2021). A case study on distributed consensus problem on cloud-based systems. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 53, pp. 627–636). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-15-5258-8_58
Mendeley helps you to discover research relevant for your work.