Abstract
We present a transparent middleware for fault tolerance based on RADIC, Redundant Array of Distributed Independent Controllers, a transparent and scalable fault tolerant architecture for parallel applications. It is designed at socket level and makes a secure tunnel connection able to keep the tcp sessions established by the application in spite of node failures. It is located at user level and is independent of the message-passing communication library being used. The protection gets through uncoordinated checkpoints and log message and the recovery are done in a automatic way so in case of node failures there is no need of intervention of the administrator. We have tested our fault tolerance system by executing a master-worker (M/W) and SPMD applications that follow different communication patterns. © 2012 IEEE.
Author supplied keywords
Cite
CITATION STYLE
Castro, M., Rexachs, D., & Luque, E. (2012). Transparent fault tolerance solution at socket level based on RADIC. In Proceedings of the 2012 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2012 (pp. 831–832). https://doi.org/10.1109/ISPA.2012.121
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.