Best fitting fixed-length substring patterns for a set of strings

3Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Finding a pattern, or a set of patterns that best characterizes a set of strings is considered important in the context of Knowledge Discovery as applied in Molecular Biology. Our main objective is to address the problem of "over-generalization", which is the phenomenon that a characterization is so general that it potentially includes many incorrect examples. To overcome this we formally define a criteria for a most fitting language for a set of strings, via a natural notion of density. We show how the problem can be solved by solving the membership problem and counting problem, and we study the runtime complexities of the problem with respect to three solution spaces derived from unions of the languages generated from fixed-length substring patterns. Two of these we show to be solvable in time polynomial to the input size. In the third case, however, the problem turns out to be NP-complete. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Ono, H., & Ng, Y. K. (2005). Best fitting fixed-length substring patterns for a set of strings. In Lecture Notes in Computer Science (Vol. 3595, pp. 240–250). Springer Verlag. https://doi.org/10.1007/11533719_26

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free