A mechanism for stream program performance recovery in resource limited compute clusters

Miyuru Dayarathna; Toyotaro Suzumura

Conference Proceedings

A mechanism for stream program performance recovery in resource limited compute clusters

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7826 LNCS(PART 2) 164-178

DOI: 10.1007/978-3-642-37450-0_12

6Citations

2Readers

Get full text

Abstract

Replication, the widely adapted technique for crash fault tolerance introduces additional infrastructural costs for resource limited clusters. In this paper we take a different approach for maintaining stream program performance during crash failures. It is based on the concepts of automatic code generation. Albatross, the middleware we introduce for this task maintains the same performance level during crash failures based on predetermined priority values assigned to each stream program. Albatross constructs different versions of the input stream programs (sample programs) with different levels of performance characteristics, and assigns the best performing programs for normal operations. During node failure or node recovery, potential use of a different version of sample program is evaluated in order to bring the performance of each job back to its original level. We evaluated effectiveness of this approach with three different real world stream computing applications on System S distributed stream processing platform. We show that our approach is capable of maintaining stream program performance even if half of the nodes of the cluster has been crashed using both Apnoea, and Regex applications. © Springer-Verlag 2013.

Author supplied keywords

Cite

CITATION STYLE

APA

Dayarathna, M., & Suzumura, T. (2013). A mechanism for stream program performance recovery in resource limited compute clusters. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7826 LNCS, pp. 164–178). https://doi.org/10.1007/978-3-642-37450-0_12

A mechanism for stream program performance recovery in resource limited compute clusters

Abstract

Author supplied keywords

Cite

Register to see more suggestions