MOST: Model-Based Compression with Outlier Storage for Time Series Data

  • Yang Z
  • Chen S
N/ACitations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Time series data are used in a wide variety of applications. The explosive growth of the amount of time series data poses a significant challenge in efficient data storage and query processing. Unfortunately, existing compression techniques either show only low to medium compression ratio on time series data, or incur significant decompression overhead during query processing.We propose a novel compression technique, MOST (Model-based compression with Outlier STorage) for time series data. As measurement values often change smoothly in a period of time, we divide a time series into segments of smooth changes, then compute a linear model for each segment. Since tiny errors are often acceptable in analysis tasks, we omit data points whose computed values are within a pre-specified error threshold from the actual values, thereby effectively reducing the data size. Outliers are rare but important for many applications, and therefore we store outliers explicitly. Moreover, for processing MOST compressed data, we propose a segment-outlier dual-mode query engine that computes segments as a whole as much as possible, and build a prototype MostDB. Experimental results on real-world data sets show that MOST achieves 9.45-15.04x compression ratios. Compared to existing time series databases, MostDB achieves up to 11.68x speedups for common queries from the IoTDB Benchmark.

Cite

CITATION STYLE

APA

Yang, Z., & Chen, S. (2023). MOST: Model-Based Compression with Outlier Storage for Time Series Data. Proceedings of the ACM on Management of Data, 1(4), 1–29. https://doi.org/10.1145/3626737

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free