A multi-variate time series clustering approach based on intermediate fusion: A case study in air pollution data imputation

19Citations
Citations of this article
32Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Multivariate Time Series Clustering (MVTS) is an essential task, especially for large and complex dataset, but it has received limited attention in the literature. We are motivated by a real-world problem: the need to cluster air pollution data to produce plausible imputations for missing measurements for some pollutants. Our main focus will be on the UK air quality assessments, the study uses data collected from automatic monitoring stations during four-year period (2015–2018). In this work, we propose a MVTS clustering method followed by an imputation methods for the whole Time Series (TS). We compare two approaches to cluster the stations: univariate TS clustering using Shape-Based Distance (SBD) for individual pollutants, and MVTS clustering using the fused similarity that combines the SBD for all the pollutants. We run a k-means algorithm to produce clusters with each approach on the same dataset. Our analysis shows that using MVTS clustering produces the best clusters as measured by various quality indexes and by the imputations they help to reduce the error average between imputed and real values based on the Root Mean Squared Error (RMSE) and its standard deviation (Std).

Cite

CITATION STYLE

APA

Alahamade, W., Lake, I., Reeves, C. E., & De La Iglesia, B. (2022). A multi-variate time series clustering approach based on intermediate fusion: A case study in air pollution data imputation. Neurocomputing, 490, 229–245. https://doi.org/10.1016/j.neucom.2021.09.079

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free