Simrank: Rapid and sensitive general-purpose k-mer search tool

Todd Z. DeSantis; Keith Keller; Ulas Karaoz; Alexander V. Alekseyenko; Navjeet N.S. Singh; Eoin L. Brodie; Zhiheng Pei; Gary L. Andersen; Niels Larsen

Journal ArticleOPEN ACCESS

Simrank: Rapid and sensitive general-purpose k-mer search tool

BMC Ecology (2011) 11

DOI: 10.1186/1472-6785-11-11

25Citations

82Readers

Abstract

Background: Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp. Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available.Results: Here we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset.Conclusions: Simrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity. © 2011 DeSantis et al; licensee BioMed Central Ltd.

Cite

CITATION STYLE

APA

DeSantis, T. Z., Keller, K., Karaoz, U., Alekseyenko, A. V., Singh, N. N. S., Brodie, E. L., … Larsen, N. (2011). Simrank: Rapid and sensitive general-purpose k-mer search tool. BMC Ecology, 11. https://doi.org/10.1186/1472-6785-11-11

Simrank: Rapid and sensitive general-purpose k-mer search tool

Abstract

Cite

Register to see more suggestions