Pattern discovery allowing wild-cards, substitution matrices, and multiple score functions

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Pattern discovery has many applications in finding functionally or structurally important regions in biological sequences (binding sites, regulatory sites, protein signatures etc.). In this paper we present a new pattern discovery algorithm, which has the following features: - it allows to find, in exactly the same manner and without any prior specification, patterns with fixed length gaps (i.e. sequences of one or several consecutive wild-cards) and contiguous patterns; - it allows the use of any pairwise score function, thus offering multiple ways to define or to constrain the type of the searched patterns; in particular, one can use substitution matrices (PAM, BLOSUM) to compare amino acids, or exact matchings to compare nucleotides, or equivalency sets in both cases. We describe the algorithm, compare it to other algorithms and give the results of the tests on discovering binding sites for DNA-binding proteins (ArgR, LexA, PurR, TyrR respectively) in E. coli, and promoter sites in a set of Dicot plants. © Springer-Verlag Berlin Heidelberg 2003.

Cite

CITATION STYLE

APA

Mancheron, A., & Rusu, I. (2003). Pattern discovery allowing wild-cards, substitution matrices, and multiple score functions. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2812, 124–138. https://doi.org/10.1007/978-3-540-39763-2_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free