Hadoop is a widely used open source mapreduce framework. Its performance is critical because it increases the usefulness of products and services for a large number of companies who have adopted Hadoop for their business purposes. One of the configuration parameters that influences the resource allocation and thus the performance of a Hadoop application is map slot value (MSV). MSV determines the number of map tasks that run concurrently on a node. For a given architecture, a Hadoop application has an MSV for which its performance is best. Furthermore, there is not a single map slot value that is best for all applications. A Hadoop application’s performance suffers when MSV is not the best. Therefore, knowing the best MSV is important for an application. In this work, we find a low-overhead method to predict the best MSV using a new Hadoop counter that measures per-map task CPU utilization. Our experiments on a variety of Hadoop applications show that using a single MSV for all applications results in performance degradation up to 132% when compared to using the best MSV for each application.
CITATION STYLE
Kc, K., & Freeh, V. W. (2014). Tuning Hadoop map slot value using CPU metric. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8807, 141–153. https://doi.org/10.1007/978-3-319-13021-7_11
Mendeley helps you to discover research relevant for your work.