Monitoring large-scale and complex systems often generates high-dimensional and highly dynamic time series data. In such a scenario, massive metadata has to be maintained to support efficient querying, whose large footprint poses great challenges to in-memory databases. In this paper, we present ByteSeries, an in-memory time series database that is designed specifically for large-scale monitoring systems to manage high-dimensional time series. We start with an analysis of the production data and workload at ByteDance's metric monitoring system, which contains over 10 billion time series dimensions. The observation of high overhead of metadata management in high-dimensional time series data calls for a rethink of time series database systems. Byte-Series's memory structure employs the novel Compressed Inverted Index to effectively compress metadata while maintaining high efficiency for multi-dimensional queries. In addition, an algorithm is proposed to effectively convert data into compressed form without sacrificing the data ingestion throughput. We experimentally evaluate ByteSeries by comparing it with ByteDance's original production system, tsdc, as well as two open-source systems, namely Gorilla and Prometheus. We show that ByteSeries significantly improves over ByteDance's original production system by 1) reducing the memory footprint of metadata by 60% and the whole memory consumption by 50%, and 2) speeding up multi-dimensional queries by 1.8x-10.7x.
CITATION STYLE
Shi, X., Feng, Z., Li, K., Zhou, Y., Jin, H., Jiang, Y., … Li, X. (2020). ByteSeries: An in-memory time series database for large-scale monitoring systems. In SoCC 2020 - Proceedings of the 2020 ACM Symposium on Cloud Computing (pp. 60–73). Association for Computing Machinery, Inc. https://doi.org/10.1145/3419111.3421289
Mendeley helps you to discover research relevant for your work.