Improved search of large transcriptomic sequencing databases using split sequence bloom trees

20Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Enormous databases of short-read RNA-seq sequencing experiments such as the NIH Sequencing Read Archive (SRA) are now available. These databases could answer many questions about the condition-specific expression or population variation, and this resource is only going to grow over time. However, these collections remain difficult to use due to the inability to search for a particular expressed sequence. While some progress has been made on this problem, it is still not feasible to search collections of hundreds of terabytes of short-read sequencing experiments. We introduce an indexing scheme called Split Sequence Bloom Tree (SSBT) to support sequence-based querying of terabyte-scale collections of thousands of short-read sequencing experiments. SSBT is an improvement over the SBT [1] data structure for the same task. We apply SSBT to the problem of finding conditions under which query transcripts are expressed. Our experiments are conducted on a set of 2,652 publicly available RNA-seq experiments contained in the NIH for the breast, blood, and brain tissues. We demonstrate that this SSBT index can be queried for a 1000 nt sequence in under 4 min using a single thread and can be stored in just 39 GB, a five-fold improvement in search and storage costs compared to SBT.

Cite

CITATION STYLE

APA

Solomon, B., & Kingsford, C. (2017). Improved search of large transcriptomic sequencing databases using split sequence bloom trees. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10229 LNCS, pp. 257–271). Springer Verlag. https://doi.org/10.1007/978-3-319-56970-3_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free