TimeSeriesClustering: An extensible framework in Julia

Holger Teichgraeber; Lucas Kuepper; Adam Brandt

Journal ArticleOPEN ACCESS

TimeSeriesClustering: An extensible framework in Julia

Teichgraeber H
Kuepper L
Brandt A

Journal of Open Source Software (2019) 4(41) 1573

DOI: 10.21105/joss.01573

N/ACitations

14Readers

Abstract

TimeSeriesClustering is a Julia implementation of unsupervised learning methods for time series datasets. It provides functionality for clustering and aggregating, detecting motifs, and quantifying similarity between time series datasets. The software provides a type system for temporal data, and provides an implementation of the most commonly used clustering methods and extreme value selection methods for temporal data. TimeSeriesClustering provides simple integration of multi-dimensional time-series data (e.g., multiple attributes such as wind availability, solar availability, and electricity demand) in a single aggregation process. The software is applicable to general time series datasets and lends itself well to a multitude of application areas within the field of time series data mining. TimeSeriesClustering was originally developed to perform time series aggregation for energy systems optimization problems. Because of the software's origin, many of the examples in this work stem from the field of energy systems optimization. General package features The unique design of TimeSeriesClustering allows for scientific comparison of the performance of different time-series aggregation methods, both in terms of the statistical error measure and in terms of its impact on the application outcome. The clustering methods that are implemented in TimeSeriesClustering follow the framework presented by Teichgraeber & Brandt (2019), and the extreme value selection methods follow the framework presented by Lindenmeyer et al. (2020). Using these frameworks allows TimeSeriesClustering to be generally extensible to new aggregation methods in the future. The following are the key features that TimeSeriesClustering provides. Implementation details can be found in the software's documentation. • The type system: The data type (called struct in Julia) ClustData stores all time-series data in a common format. Besides the data itself, it automatically processes and stores information that is relevant for later use in the application for which the time-series data will be used. The data type ClustResult additionally stores information relevant for evaluating clustering performance. These data types make TimeSeriesClustering easy to integrate with any analysis that relies on iterative evaluation of the clustering and aggregation methods. • The aggregation methods: The most commonly used clustering methods and extreme value selection methods are implemented with a common interface, allowing for simple comparison of these methods on a given data set and optimization problem. • The generalized import of time series in csv format: Time series can be loaded through csv files in a pre-defined format. From this, variable names, which we call attributes, and node names are automatically loaded and stored. The original time series can be sliced into periods of user-defined length. This information can later be used in the definition of the sets of the optimization problem.

Cite

CITATION STYLE

APA

Teichgraeber, H., Kuepper, L., & Brandt, A. (2019). TimeSeriesClustering: An extensible framework in Julia. Journal of Open Source Software, 4(41), 1573. https://doi.org/10.21105/joss.01573

TimeSeriesClustering: An extensible framework in Julia

Abstract

Cite

Register to see more suggestions