Partitioning is an important step in several database algorithms, including sorting, aggregation, and joins. Partitioning is also fundamental for dividing work into equal-sized (or balanced) parallel subtasks. In this paper, we aim to find, materialize and maintain a set of partitioning elements (splitters) for a data set. Unlike traditional partitioning elements, our splitters define both inequality and equality partitions, which allows us to bound the size of the inequality partitions. We provide an algorithm for determining an optimal set of splitters from a sorted data set and show that it has time complexity O(klg2 N), where k is the number of splitters requested and N is the size of the data set. We show how the algorithm can be extended to pairs of tables, so that joins can be partitioned into work units that have balanced cost. We demonstrate experimentally (a) that finding the optimal set of splitters can be done efficiently, and (b) that using the precomputed splitters can improve the time to sort a data set by up to 76%, with particular benefits in the presence of a few heavy hitters. Copyright 2009 ACM.
CITATION STYLE
Ross, K. A., & Cieslewicz, J. (2009). Optimal splitters for database partitioning with size bounds. In ACM International Conference Proceeding Series (Vol. 361, pp. 98–110). https://doi.org/10.1145/1514894.1514907
Mendeley helps you to discover research relevant for your work.