Measuring over-generalization in the minimal multiple generalizations of biosequences

3Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We consider the problem of finding a set of patterns that best characterizes a set of strings. To this end, Arimura et. al. [3] considered the use of minimal multiple generalizations (mmg) for such characterizations. Given any sample set, the mmgs are, roughly speaking, the most (syntactically) specific set of languages containing the sample within a given class of languages. Takae et. al. [17] found the mmgs of the class of pattern languages [1] which includes so-called sort symbols to be fairly accurate as predictors for signal peptides. We first reproduce their results using updated data. Then, by using a measure for estimating the level of over-generalizations made by the mmgs, we show results that explain the high level of accuracies resulting from the use of sort symbols, and discuss how better results can be obtained. The measure that we suggests here can also be applied to other types of patterns, e.g. the PROSITE patterns [4]. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Ng, Y. K., Ono, H., & Shinohara, T. (2005). Measuring over-generalization in the minimal multiple generalizations of biosequences. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3735 LNAI, pp. 176–188). Springer Verlag. https://doi.org/10.1007/11563983_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free