Lightweight BWT construction for very large string collections

Markus J. Bauer; Anthony J. Cox; Giovanna Rosone

Conference Proceedings

Lightweight BWT construction for very large string collections

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 6661 LNCS 219-231

DOI: 10.1007/978-3-642-21458-5_20

32Citations

30Readers

Get full text

Abstract

A modern DNA sequencing machine can generate a billion or more sequence fragments in a matter of days. The many uses of the BWT in compression and indexing are well known, but the computational demands of creating the BWT of datasets this large have prevented its applications from being widely explored in this context. We address this obstacle by presenting two algorithms capable of computing the BWT of very large string collections. The algorithms are lightweight in that the first needs O(m logm) bits of memory to process m strings and the memory requirements of the second are constant with respect to m. We evaluate our algorithms on collections of up to 1 billion strings and compare their performance to other approaches on smaller datasets. Although our tests were on collections of DNA sequences of uniform length, the algorithms themselves apply to any string collection over any alphabet. © 2011 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Bauer, M. J., Cox, A. J., & Rosone, G. (2011). Lightweight BWT construction for very large string collections. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6661 LNCS, pp. 219–231). https://doi.org/10.1007/978-3-642-21458-5_20

Lightweight BWT construction for very large string collections

Abstract

Author supplied keywords

Cite

Register to see more suggestions