Abstract
Multi-domain proteins result from the duplication and combination of complex but limited number of domains. The ability to distinguish multi-domain homologs from unrelated pairs that share a domain is essential to genomic analysis. Heuristics based on sequence similarity and alignment coverage have been proposed to screen out domain insertions but have met with limited success. In this paper we propose a unique protein classification schema for multi-domain protein superfamilies. Segmented profiles of physico-chemical properties and amino acid composition are created for vector quantization based dimensionality reduction to create a feature profile for rule-discovery and classification. Association rules are mined to identify isomorphic relationships that govern the formation of domains between proteins to correctly predict homologous pairs and reject unrelated pairs, including those that share domains. Our results demonstrate that effective classification of conserved domain classes can be performed using these feature profiles, and the classifier is not susceptible to class imbalances frequently encountered in these databases. © 2009 Springer Berlin Heidelberg.
Author supplied keywords
Cite
CITATION STYLE
Singh, H., Chowriappa, P., & Dua, S. (2009). Multi-domain protein family classification using isomorphic inter-property relationships. In Communications in Computer and Information Science (Vol. 40, pp. 473–484). https://doi.org/10.1007/978-3-642-03547-0_45
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.