SAHA: A string adaptive hash table for analytical databases

Tianqi Zheng; Zhibin Zhang; Xueqi Cheng

Journal ArticleOPEN ACCESS

SAHA: A string adaptive hash table for analytical databases

Applied Sciences (Switzerland) (2020) 10(6)

DOI: 10.3390/app10061915

8Citations

14Readers

Abstract

Hash tables are the fundamental data structure for analytical database workloads, such as aggregation, joining, set filtering and records deduplication. The performance aspects of hash tables differ drastically with respect to what kind of data are being processed or how many inserts, lookups and deletes are constructed. In this paper, we address some common use cases of hash tables: aggregating and joining over arbitrary string data. We designed a new hash table, SAHA, which is tightly integrated with modern analytical databases and optimized for string data with the following advantages: (1) it inlines short strings and saves hash values for long strings only; (2) it uses special memory loading techniques to do quick dispatching and hashing computations; and (3) it utilizes vectorized processing to batch hashing operations. Our evaluation results reveal that SAHA outperforms state-of-the-art hash tables by one to five times in analytical workloads, including Google's SwissTable and Facebook's F14Table. It has been merged into the ClickHouse database and shows promising results in production.

Author supplied keywords

Cite

CITATION STYLE

APA

Zheng, T., Zhang, Z., & Cheng, X. (2020). SAHA: A string adaptive hash table for analytical databases. Applied Sciences (Switzerland), 10(6). https://doi.org/10.3390/app10061915

SAHA: A string adaptive hash table for analytical databases

Abstract

Author supplied keywords

Cite

Register to see more suggestions