Replication, the widely adapted technique for crash fault tolerance introduces additional infrastructural costs for resource limited clusters. In this paper we take a different approach for maintaining stream program performance during crash failures. It is based on the concepts of automatic code generation. Albatross, the middleware we introduce for this task maintains the same performance level during crash failures based on predetermined priority values assigned to each stream program. Albatross constructs different versions of the input stream programs (sample programs) with different levels of performance characteristics, and assigns the best performing programs for normal operations. During node failure or node recovery, potential use of a different version of sample program is evaluated in order to bring the performance of each job back to its original level. We evaluated effectiveness of this approach with three different real world stream computing applications on System S distributed stream processing platform. We show that our approach is capable of maintaining stream program performance even if half of the nodes of the cluster has been crashed using both Apnoea, and Regex applications. © Springer-Verlag 2013.
CITATION STYLE
Dayarathna, M., & Suzumura, T. (2013). A mechanism for stream program performance recovery in resource limited compute clusters. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7826 LNCS, pp. 164–178). https://doi.org/10.1007/978-3-642-37450-0_12
Mendeley helps you to discover research relevant for your work.