Using realistic simulation for performance analysis of mapreduce setups

70Citations
Citations of this article
81Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recently, there has been a huge growth in the amount of data processed by enterprises and the scientific computing community. Two promising trends ensure that applications will be able to deal with ever increasing data volumes: First, the emergence of cloud computing, which provides transparent access to a large number of compute, storage and networking resources; and second, the development of the MapReduce programming model, which provides a highlevel abstraction for data-intensive computing. However, the design space of these systems has not been explored in detail. Specifically, the impact of various design choices and run-time parameters of a MapReduce system on application performance remains an open question. To this end, we embarked on systematically understanding the performance of MapReduce systems, but soon realized that understanding effects of parameter tweaking in a large-scale setup with many variables was impractical. Consequently, in this paper, we present the design of an accurate MapReduce simulator, MRPerf, for facilitating exploration of MapReduce design space. MRPerf captures various aspects of a MapReduce setup, and uses this information to predict expected application performance. In essence, MRPerf can serve as a design tool for MapReduce infrastructure, and as a planning tool for making MapReduce deployment far easier via reduction in the number of parameters that currently have to be hand-tuned using rules of thumb. Our validation of MRPerf using data from medium-scale production clusters shows that it is able to predict application performance accurately, and thus can be a useful tool in enabling cloud computing. Moreover, an initial application of MRPerf to our test clusters running Hadoop, revealed a performance bottleneck, fixing which resulted in up to 28.05% performance improvement. Copyright 2009 ACM.

Cite

CITATION STYLE

APA

Wang, G., Butt, A. R., Pandey, P., & Gupta, K. (2009). Using realistic simulation for performance analysis of mapreduce setups. In Proc. 1st ACM Workshop on Large-Scale System and Application Performance, LSAP2009, Co-located with the 2009 International Symposium on High Performance Distributed Computing Conference, HPDC’09 (pp. 19–26). https://doi.org/10.1145/1552272.1552278

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free