Engineering Burstsort: Toward fast in-place string sorting

4Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Burstsort is a trie-based string sorting algorithm that distributes strings into small buckets whose contents are then sorted in cache. This approach has earlier been demonstrated to be efficient on modern cache-based processors [Sinha & Zobel, JEA 2004]. In this article, we introduce improvements that reduce by a significant margin the memory requirement of Burstsort: It is now less than 1% greater than an in-place algorithm. These techniques can be applied to existing variants of Burstsort, as well as other string algorithms such as for string management. We redesigned the buckets, introducing sub-buckets and an index structure for them, which resulted in an order-of-magnitude space reduction. We also show the practicality of moving some fields from the trie nodes to the insertion point (for the next string pointer) in the bucket; this technique reduces memory usage of the trie nodes by one-third. Importantly, the trade-off for the reduction in memory use is only a very slight increase in the running time of Burstsort on realworld string collections. In addition, during the bucket-sorting phase, the string suffixes are copied to a small buffer to improve their spatial locality, lowering the running time of Burstsort by up to 30%. These memory usage enhancements have enabled the copy-based approach [Sinha et al., JEA 2006] to also reduce the memory usage with negligible impact on speed.

Cite

CITATION STYLE

APA

Sinha, R., & Wirth, A. (2010). Engineering Burstsort: Toward fast in-place string sorting. In ACM Journal of Experimental Algorithmics (Vol. 15). Association for Computing Machinery. https://doi.org/10.1145/1671973.1671978

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free