Computing distribution of scale independent motifs in biological sequences

14Citations
Citations of this article
35Readers
Mendeley users who have this article in their library.

Abstract

The use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques. © 2006 Almeida and Vinga; licensee BioMed Central Ltd.

References Powered by Scopus

EMBOSS: The European Molecular Biology Open Software Suite

7246Citations
N/AReaders
Get full text

Alignment-free sequence comparison - A review

653Citations
N/AReaders
Get full text

Chaos game representation of gene structure

620Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Information theory applications for biological sequence analysis

99Citations
N/AReaders
Get full text

Local Renyi entropic profiles of DNA sequences

37Citations
N/AReaders
Get full text

Pattern matching through Chaos Game Representation: Bridging numerical and discrete data structures for biological sequence analysis

33Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Almeida, J. S., & Vinga, S. (2006). Computing distribution of scale independent motifs in biological sequences. Algorithms for Molecular Biology, 1(1). https://doi.org/10.1186/1748-7188-1-18

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 16

55%

Researcher 8

28%

Professor / Associate Prof. 5

17%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 11

41%

Computer Science 9

33%

Biochemistry, Genetics and Molecular Bi... 4

15%

Engineering 3

11%

Save time finding and organizing research with Mendeley

Sign up for free