Abstract
The learned index structures have reshaped our perspectives on the design of traditional data structures. With machine learning (ML) techniques, they can achieve better lookup performance than existing indexes. However, current learned indexes primarily focus on integer-key workloads and failed to efficiently index variable-length string keys. We introduce SIndex, a concurrent learned index specialized in variable-length string key workloads. To reduce the cost of model inference and data accesses, SIndex groups keys with shared prefixes and use each key's unique part for model training. We evaluate SIndex with both real-world and synthesized datasets. The result shows that SIndex can achieve up to 91% better performance compared with other state-of-the-art index structures. We have open-sourced our implementation1.
Cite
CITATION STYLE
Wang, Y., Tang, C., Wang, Z., & Chen, H. (2020). SIndex: A scalable learned index for string keys. In APSys 2020 - Proceedings of the 2020 ACM SIGOPS Asia-Pacific Workshop on Systems (pp. 17–24). Association for Computing Machinery. https://doi.org/10.1145/3409963.3410496
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.