Extracting best consensus motifs from positive and negative examples

3Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We define the best consensus motif (BCM) problem motivated by the problem of extracting motifs from nucleic acid and amino acid sequences. A type over an alphabet Σ is a family Ω of subsets of Σ *. A motif π of type Ω is a string π=π1 ... πn of motif components, each of which stands for an element in Ω. The BCM problem for Ω is, given a yes-no sample S={(α (1),β(1),..., (α(m),β(m))} of pairs of strings in Σ* with α (i) ≠β(i) for 1 ≤ i ≤ m, to find a motif π of type Ω that maximizes the number of good pairs in S, where (α (i), β (i)) is good for π if π accepts α (i) and rejects β (i) We prove that the BCM problem is NP-complete even for a very simple type (Formula presented), which is used, in practice, for describing protein motifs in the PROSITE database. We also show that the NP-completeness of the problem does not change for the type Ω ∞=Ω1∪ {Σ+}∪{Σ[i,j]1≤i≤ j}, where Σ [i,j] is the set of strings over Σ of length between i and j Furthermore, for the BCM problem for Ω 1 we provide a polynomial-time greedy algorithm based on the probabilistic method. Its performance analysis shows an explicit approximation ratio of the algorithm.

Cite

CITATION STYLE

APA

Tateishi, E., Maruyama, O., & Miyano, S. (1996). Extracting best consensus motifs from positive and negative examples. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1046, pp. 219–230). Springer Verlag. https://doi.org/10.1007/3-540-60922-9_19

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free