I/o characterization of big data workloads in data centers

Fengfeng Pan; Yinliang Yue; Jin Xiong; Daxiang Hao

Conference Proceedings

I/o characterization of big data workloads in data centers

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8807 85-97

DOI: 10.1007/978-3-319-13021-7_7

13Citations

20Readers

Get full text

Abstract

As the amount of data explodes rapidly, more and more organizations tend to use data centers to make effective decisions and gain a competitive edge. Big data applications have gradually dominated the data centers workloads, and hence it has been increasingly important to understand their behaviour in order to further improve the performance of data centers. Due to the constantly increased gap between I/O devices and CPUs, I/O performance dominates the overall system performance, so characterizing I/O behaviour of big data workloads is important and imperative. In this paper, we select four typical big data workloads in broader areas from the BigDataBench which is a big data benchmark suite from internet services. They are Aggregation, TeraSort, Kmeans and PageRank. We conduct detailed deep analysis of their I/O characteristics, including disk read/write bandwidth, I/O devices utilization, average waiting time of I/O requests, and average size of I/O requests, which act as a guide to design highperformance, low-power and cost-aware big data storage systems.

Cite

CITATION STYLE

APA

Pan, F., Yue, Y., Xiong, J., & Hao, D. (2014). I/o characterization of big data workloads in data centers. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8807, pp. 85–97). Springer Verlag. https://doi.org/10.1007/978-3-319-13021-7_7

I/o characterization of big data workloads in data centers

Abstract

Cite

Register to see more suggestions