Proactive fault tolerance using heartbeat strategy for fault detection

1Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Failure is something which causes services on the cloud to go down for some time period. Most of the times instead of recovery and repair, we opt for virtual machine migration where failover of the failed service is done on some other running virtual server so that the service is revived. Virtual migrations and recovery mechanisms consume a lot of energy and many approaches are implemented to make them energy efficient. Failure Detection is a topic of equal importance and comes under fault tolerance. Failure detection if done properly can be more effective and energy/cost saving than fault recovery. Heartbeat strategy is one such failure detection approach where live processes send an “I am alive” message to the host device at some pre-defined fixed intervals which ensures that the process is running fine. In this paper, we propose to mark the nodes whose processes have failed to send the heartbeat message and prepare a count (confidence factor, α) for the same. In primary testing, if this confidence factor reaches a specific threshold then that particular node is sent for confidence testing (second level failure detection testing using a different time sequence of heartbeat message arrival) and later marked for failure recovery (if found faulty). Fault recovery techniques are then applied to it so that it can be corrected and reused and the current jobs can be migrated to the better node during the recovery period. If the confidence factor, α is below the threshold value then no action is taken and only network parameters and connections can be rechecked. This method will re-ensure the trust on heartbeat strategy for fault detection and save the device from failure.

Cite

CITATION STYLE

APA

Prakash, S., Vyas, V., & Bhola, A. (2019). Proactive fault tolerance using heartbeat strategy for fault detection. International Journal of Engineering and Advanced Technology, 9(1), 4927–4932. https://doi.org/10.35940/ijeat.A2079.109119

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free