Temporal structure learning for clustering massive data streams in real-time

Michael Hahsler; Margaret H. Dunham

Conference Proceedings

Temporal structure learning for clustering massive data streams in real-time

Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011 (2011) 664-675

DOI: 10.1137/1.9781611972818.57

12Citations

34Readers

Get full text

Abstract

This paper describes one of the first attempts to model the temporal structure of massive data streams in real-time using data stream clustering. Recently, many data stream clustering algorithms have been developed which efficiently find a partition of the data points in a data stream. However, these algorithms disregard the information represented by the temporal order of the data points in the stream which for many applications is an important part of the data stream. In this paper we propose a new framework called Temporal Relationships Among Clusters for Data Streams (TRACDS) which allows us to learn the temporal structure while clustering a data stream. We identify, organize and describe the clustering operations which are used by state-of-the-art data stream clustering algorithms. Then we show that by defining a set of new operations to transform Markov Chains with states representing clusters dynamically, we can efficiently capture temporal ordering information. This framework allows us to preserve temporal relationships among clusters for any state-of-the-art data stream clustering algorithm with only minimal overhead. To investigate the usefulness of TRACDS, we evaluate the improvement of TRACDS over pure data stream clustering for anomaly detection using several synthetic and real-world data sets. The experiments show that TRACDS is able to considerably improve the results even if we introduce a high rate of incorrect time stamps which is typical for real-world data streams. Copyright © SIAM.

Author supplied keywords

Cite

CITATION STYLE

APA

Hahsler, M., & Dunham, M. H. (2011). Temporal structure learning for clustering massive data streams in real-time. In Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011 (pp. 664–675). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611972818.57

Temporal structure learning for clustering massive data streams in real-time

Abstract

Author supplied keywords

Cite

Register to see more suggestions