This paper presents different scalable methods to predict time series of very long length such as time series with a high sampling frequency. The Apache Spark framework for distributed computing is proposed in order to achieve the scalability of the methods. Namely, the existing MLlib machine learning library from Spark has been used. Since MLlib does not support multivariate regression, the forecasting problem has been split into h forecasting subproblems, where h is the number of future values to predict. Then, representative forecasting methods of different nature have been chosen such as models based on trees, two ensembles techniques (gradient-boosted trees and random forests), and a linear regression as a reference method. Finally, the methodology has been tested on a real-world dataset from the Spanish electricity load data with a ten-minute frequency.
CITATION STYLE
Galicia, A., Torres, J. F., Martínez-Álvarez, F., & Troncoso, A. (2017). Scalable forecasting techniques applied to big electricity time series. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10306 LNCS, pp. 165–175). Springer Verlag. https://doi.org/10.1007/978-3-319-59147-6_15
Mendeley helps you to discover research relevant for your work.