The Case for a Learned Sorting Algorithm

46Citations
Citations of this article
99Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Sorting is one of the most fundamental algorithms in Computer Science and a common operation in databases not just for sorting query results but also as part of joins (i.e., sort-merge-join) or indexing. In this work, we introduce a new type of distribution sort that leverages a learned model of the empirical CDF of the data. Our algorithm uses a model to efficiently get an approximation of the scaled empirical CDF for each record key and map it to the corresponding position in the output array. We then apply a deterministic sorting algorithm that works well on nearly-sorted arrays (e.g., Insertion Sort) to establish a totally sorted order. We compared this algorithm against common sorting approaches and measured its performance for up to 1 billion normally-distributed double-precision keys. The results show that our approach yields an average 3.38x performance improvement over C++ STL sort, which is an optimized Quicksort hybrid, 1.49x improvement over sequential Radix Sort, and 5.54x improvement over a C++ implementation of Timsort, which is the default sorting function for Java and Python.

Cite

CITATION STYLE

APA

Kristo, A., Vaidya, K., Çetintemel, U., Misra, S., & Kraska, T. (2020). The Case for a Learned Sorting Algorithm. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 1001–1016). Association for Computing Machinery. https://doi.org/10.1145/3318464.3389752

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free