Div-blast: Diversification of sequence search results

3Citations
Citations of this article
34Readers
Mendeley users who have this article in their library.

Abstract

Sequence similarity tools, such as BLAST, seek sequences most similar to a query from a database of sequences. They return results significantly similar to the query sequence and that are typically highly similar to each other. Most sequence analysis tasks in bioinformatics require an exploratory approach, where the initial results guide the user to new searches. However, diversity has not yet been considered an integral component of sequence search tools for this discipline. Some redundancy can be avoided by introducing non-redundancy during database construction, but it is not feasible to dynamically set a level of non-redundancy tailored to a query sequence. We introduce the problem of diverse search and browsing in sequence databases that produce non-redundant results optimized for any given query. We define diversity measures for sequences and propose methods to obtain diverse results extracted from current sequence similarity search tools. We also propose a new measure to evaluate the diversity of a set of sequences that is returned as a result of a sequence similarity query. We evaluate the effectiveness of the proposed methods in post-processing BLAST and PSIBLAST results. We also assess the functional diversity of the returned results based on available Gene Ontology annotations. Additionally, we include a comparison with a current redundancy elimination tool, CD-HIT. Our experiments show that the proposed methods are able to achieve more diverse yet significant result sets compared to static non-redundancy approaches. In both sequencebased and functional diversity evaluation, the proposed diversification methods significantly outperform original BLAST results and other baselines. A web based tool implementing the proposed methods, Div-BLAST, can be accessed at cedar.cs.bilkent.edu.tr/Div-BLAST

References Powered by Scopus

Basic local alignment search tool

78909Citations
N/AReaders
Get full text

Gapped BLAST and PSI-BLAST: A new generation of protein database search programs

63189Citations
N/AReaders
Get full text

Gene ontology: Tool for the unification of biology

32176Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Crystal structure of the potassium-importing KdpFABC membrane complex

55Citations
N/AReaders
Get full text

Cluster Analysis of Coronavirus Sequences using Computational Sequence Descriptors: With Applications to SARS, MERS and SARS-CoV-2 (CoVID-19)

8Citations
N/AReaders
Get full text

Molecular Identification of Lobster Species Based on Cytochrome Oxidase Subunit I Gene characters

1Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Eser, E., Can, T., & Ferhatosmanoglu, H. (2014). Div-blast: Diversification of sequence search results. PLoS ONE, 9(12). https://doi.org/10.1371/journal.pone.0115445

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 10

48%

Researcher 8

38%

Professor / Associate Prof. 3

14%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 8

36%

Biochemistry, Genetics and Molecular Bi... 8

36%

Computer Science 4

18%

Engineering 2

9%

Save time finding and organizing research with Mendeley

Sign up for free