The Case for a Learned Sorting Algorithm

Ani Kristo; Kapil Vaidya; Ugur Çetintemel; Sanchit Misra; Tim Kraska

Conference Proceedings

The Case for a Learned Sorting Algorithm

Proceedings of the ACM SIGMOD International Conference on Management of Data (2020) 1001-1016

DOI: 10.1145/3318464.3389752

46Citations

99Readers

Get full text

Abstract

Sorting is one of the most fundamental algorithms in Computer Science and a common operation in databases not just for sorting query results but also as part of joins (i.e., sort-merge-join) or indexing. In this work, we introduce a new type of distribution sort that leverages a learned model of the empirical CDF of the data. Our algorithm uses a model to efficiently get an approximation of the scaled empirical CDF for each record key and map it to the corresponding position in the output array. We then apply a deterministic sorting algorithm that works well on nearly-sorted arrays (e.g., Insertion Sort) to establish a totally sorted order. We compared this algorithm against common sorting approaches and measured its performance for up to 1 billion normally-distributed double-precision keys. The results show that our approach yields an average 3.38x performance improvement over C++ STL sort, which is an optimized Quicksort hybrid, 1.49x improvement over sequential Radix Sort, and 5.54x improvement over a C++ implementation of Timsort, which is the default sorting function for Java and Python.

Author supplied keywords

Cite

CITATION STYLE

APA

Kristo, A., Vaidya, K., Çetintemel, U., Misra, S., & Kraska, T. (2020). The Case for a Learned Sorting Algorithm. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 1001–1016). Association for Computing Machinery. https://doi.org/10.1145/3318464.3389752

The Case for a Learned Sorting Algorithm

Abstract

Author supplied keywords

Cite

Register to see more suggestions