Responsible, efficient and well-planned power consumption is becoming a necessity for monetary returns and scalability of computing infrastructures. While there are numerous sources from which power data can be obtained, analyzing this data is an intrinsically hard task. In this paper, we propose a data analysis pipeline that can handle the large-scale collection of energy consumption logs, apply sophisticated modeling to enable accurate prediction, and evaluate the efficiency of the analysis approach. We present the analysis of a power consumption data set collected over a 6-month period from two clusters of the Grid'5000 experimentation platform used in production. To solve the large data challenge, we used Hadoop with Pig data processing to generate a summary of the data that provides basic statistical aggregations, over different time scales. The aggregate data is then analyzed as a time series using sophisticated modeling methods with R statistical software. Energy models from such large dataset can help in understanding the evolution of consumption patterns, predicting future energy trends, and providing basis for generalizing the energy models to similar large-scale systems.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below