MicroTaboo: A general and practical solution to the k-disjoint problem

0Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: A common challenge in bioinformatics is to identify short sub-sequences that are unique in a set of genomes or reference sequences, which can efficiently be achieved by k-mer (k consecutive nucleotides) counting. However, there are several areas that would benefit from a more stringent definition of "unique", requiring that these sub-sequences of length W differ by more than k mismatches (i.e. a Hamming distance greater than k) from any other sub-sequence, which we term the k-disjoint problem. Examples include finding sequences unique to a pathogen for probe-based infection diagnostics; reducing off-target hits for re-sequencing or genome editing; detecting sequence (e.g. phage or viral) insertions; and multiple substitution mutations. Since both sensitivity and specificity are critical, an exhaustive, yet efficient solution is desirable. Results: We present microTaboo, a method that allows for efficient and extensive sequence mining of unique (k-disjoint) sequences of up to 100 nucleotides in length. On a number of simulated and real data sets ranging from microbe- to mammalian-size genomes, we show that microTaboo is able to efficiently find all sub-sequences of a specified length W that do not occur within a threshold of k mismatches in any other sub-sequence. We exemplify that microTaboo has many practical applications, including point substitution detection, sequence insertion detection, padlock probe target search, and candidate CRISPR target mining. Conclusions: microTaboo implements a solution to the k-disjoint problem in an alignment- and assembly free manner. microTaboo is available for Windows, Mac OS X, and Linux, running Java 7 and higher, under the GNU GPLv3 license, at: https://MohammedAlJaff.github.io/microTaboo

Cite

CITATION STYLE

APA

Al-Jaff, M., Sandström, E., & Grabherr, M. (2017). MicroTaboo: A general and practical solution to the k-disjoint problem. BMC Bioinformatics, 18(1). https://doi.org/10.1186/s12859-017-1644-6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free