Mining for common motifs in protein tertiary structures holds the key to the understanding of protein functions. However, due to the formidable problem size, existing techniques for finding common substructures are computationally feasible only under certain artificially imposed constraints, such as using super-secondary structures and fixed-length segmentation. This paper presents the first, pure tertiary-level algorithm that discovers the common protein substructures without such limitations. Modeling this as a maximal common subgraph (MCS) problem, the solution is found by further mapping into the domain of maximum clique (MC). Coupling a MC solver with a graph coloring (GC) solver, the iterative algorithm, CRP-GM, is developed to narrow down towards the desired solution by feeding results from one solver into the other. The solution quality of CRP-GM amply demonstrates its potential as a new and practical data-mining tool for molecular biologists, as well as several other similar problems requiring identification of common substructures.
CITATION STYLE
Chen, C. W. K., & Yun, D. Y. Y. (1999). Knowledge discovery for protein tertiary substructures. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1711, pp. 433–442). Springer Verlag. https://doi.org/10.1007/978-3-540-48061-7_52
Mendeley helps you to discover research relevant for your work.