Indel-tolerant read mapping with trinucleotide frequencies using cache-oblivious kd-trees

3Citations
Citations of this article
45Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Motivation: Mapping billions of reads from next generation sequencing experiments to reference genomes is a crucial task, which can require hundreds of hours of running time on a single CPU even for the fastest known implementations. Traditional approaches have difficulties dealing with matches of large edit distance, particularly in the presence of frequent or large insertions and deletions (indels). This is a serious obstacle both in determining the spectrum and abundance of genetic variations and in personal genomics. Results: For the first time, we adopt the approximate string matching paradigm of geometric embedding to read mapping, thus rephrasing it to nearest neighbor queries in a q-gram frequency vector space. Using the L1 distance between frequency vectors has the benefit of providing lower bounds for an edit distance with affine gap costs. Using a cache-oblivious kd-tree, we realize running times, which match the state-of-the-art. Additionally, running time and memory requirements are about constant for read lengths between 100 and 1000 bp. We provide a first proof-of-concept that geometric embedding is a promising paradigm for read mapping and that L1 distance might serve to detect structural variations. TreQ, our initial implementation of that concept, performs more accurate than many popular read mappers over a wide range of structural variants. © The Author(s) 2012. Published by Oxford University Press.

Cite

CITATION STYLE

APA

Mahmud, M. P., Wiedenhoeft, J., & Schliep, A. (2012). Indel-tolerant read mapping with trinucleotide frequencies using cache-oblivious kd-trees. Bioinformatics, 28(18). https://doi.org/10.1093/bioinformatics/bts380

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free