Simrank: Rapid and sensitive general-purpose k-mer search tool

25Citations
Citations of this article
82Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp. Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available.Results: Here we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset.Conclusions: Simrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity. © 2011 DeSantis et al; licensee BioMed Central Ltd.

Cite

CITATION STYLE

APA

DeSantis, T. Z., Keller, K., Karaoz, U., Alekseyenko, A. V., Singh, N. N. S., Brodie, E. L., … Larsen, N. (2011). Simrank: Rapid and sensitive general-purpose k-mer search tool. BMC Ecology, 11. https://doi.org/10.1186/1472-6785-11-11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free