Good to the Last Bit: Data-Driven Encoding with CodecDB

Hao Jiang; Chunwei Liu; John Paparrizos; Andrew A. Chien; Jihong Ma; Aaron J. Elmore

Conference ProceedingsOPEN ACCESS

Good to the Last Bit: Data-Driven Encoding with CodecDB

Proceedings of the ACM SIGMOD International Conference on Management of Data (2021) 843-856

DOI: 10.1145/3448016.3457283

26Citations

14Readers

Get full text

Abstract

Columnar databases rely on specialized encoding schemes to reduce storage requirements. These encodings also enable efficient in-situ data processing. Nevertheless, many existing columnar databases are encoding-oblivious. When storing the data, these systems rely on a global understanding of the dataset or the data types to derive simple rules for encoding selection. Such rule-based selection leads to unsatisfactory performance. Specifically, when performing queries, the systems always decode data into memory, ignoring the possibility of optimizing access to encoded data. We develop CodecDB, an encoding-aware columnar database, to demonstrate the benefit of tightly-coupling the database design with the data encoding schemes. CodecDB chooses in a principled manner the most efficient encoding for a given data column and relies on encoding-aware query operators to optimize access to encoded data. Storage-wise, CodecDB achieves on average 90% accuracy for selecting the best encoding and improves the compression ratio by up to 40% compared to the state-of-the-art encoding selection solution. Query-wise, CodecDB is on average one order of magnitude faster than the latest open-source and commercial columnar databases on the TPC-H benchmark, and on average 3x faster than a recent research project on the Star-Schema Benchmark (SSB).

Author supplied keywords

Cite

CITATION STYLE

APA

Jiang, H., Liu, C., Paparrizos, J., Chien, A. A., Ma, J., & Elmore, A. J. (2021). Good to the Last Bit: Data-Driven Encoding with CodecDB. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 843–856). Association for Computing Machinery. https://doi.org/10.1145/3448016.3457283

Good to the Last Bit: Data-Driven Encoding with CodecDB

Abstract

Author supplied keywords

Cite

Register to see more suggestions