BAMQL: A query language for extracting reads from BAM files

11Citations
Citations of this article
54Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: It is extremely common to need to select a subset of reads from a BAM file based on their specific properties. Typically, a user unpacks the BAM file to a text stream using SAMtools, parses and filters the lines using AWK, then repacks them using SAMtools. This process is tedious and error-prone. In particular, when working with many columns of data, mix-ups are common and the bit field containing the flags is unintuitive. There are several libraries for reading BAM files, such as Bio-SamTools for Perl and pysam for Python. Both allow access to the BAM's read information and can filter reads, but require substantial boilerplate code; this is high overhead for mostly ad hoc filtering. Results: We have created a query language that gathers reads using a collection of predicates and common logical connectives. Queries run faster than equivalents and can be compiled to native code for embedding in larger programs. Conclusions: BAMQL provides a user-friendly, powerful and performant way to extract subsets of BAM files for ad hoc analyses or integration into applications. The query language provides a collection of predicates beyond those in SAMtools, and more flexible connectives.

Author supplied keywords

Cite

CITATION STYLE

APA

Masella, A. P., Lalansingh, C. M., Sivasundaram, P., Fraser, M., Bristow, R. G., & Boutros, P. C. (2016). BAMQL: A query language for extracting reads from BAM files. BMC Bioinformatics, 17(1). https://doi.org/10.1186/s12859-016-1162-y

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free