Classification of protein quaternary structure with support vector machine

Shao Wu Zhang; Quan Pan; Hong Cai Zhang; Yun Long Zhang; Hai Yu Wang

Journal ArticleOPEN ACCESS

Classification of protein quaternary structure with support vector machine

Bioinformatics (2003) 19(18) 2390-2396

DOI: 10.1093/bioinformatics/btg331

81Citations

37Readers

Abstract

Motivation: Since the gap between sharply increasing known sequences and slow accumulation of known structures is becoming large, an automatic classification process based on the primary sequences and known three-dimensional structure becomes indispensable. The classification of protein quaternary structure based on the primary sequences can provide some useful information for the biologists. So a fully automatic and reliable classification system is needed. This work tries to look for the effective methods of extracting attribute and the algorithm for classifying the quaternary structure from the primary sequences. Results: Both of the support vector machine (SVM) and the covariant discriminant algorithms have been first introduced to predict quaternary structure properties from the protein primary sequences. The amino acid composition and the autocorrelation functions based on the amino acid index profile of the primary sequence have been taken into account in the algorithms. We have analyzed 472 amino acid indices and selected the four amino acid indices as the examples, which have the best performance. Thus the five attribute parameter data sets (COMP, FASG, NISK, WOLS and KYTJ) were established from the protein primary sequences. The COMP attribute data set is composed of amino acid composition, and the FASG, NISK, WOLS and KYTJ attribute data sets are composed of the amino acid composition and the autocorrelation functions of the corresponding amino acid residue index. The overall accuracies of SVM are 78.5, 87.5, 83.2, 81.7 and 81.9%, respectively, for COMP, FASG, NISK, WOLS and KYTJ data sets in jackknife test, which are 19.6, 7.8, 15.5, 13.1 and 15.8%, respectively, higher than that of the covariant discriminant algorithm in the same test. The results show that SVM may be applied to discriminate between the primary sequences of homodimers and non-homodimers and the two protein sequence descriptors can reflect the quaternary structure information. Compared with previous Robert Garian's investigation, the performance of SVM is almost equal to that of the Decision tree models, and the methods of extracting feature vector from the primary sequences are superior to Robert's binning function method.

Cite

CITATION STYLE

APA

Zhang, S. W., Pan, Q., Zhang, H. C., Zhang, Y. L., & Wang, H. Y. (2003). Classification of protein quaternary structure with support vector machine. Bioinformatics, 19(18), 2390–2396. https://doi.org/10.1093/bioinformatics/btg331

Classification of protein quaternary structure with support vector machine

Abstract

Cite

Register to see more suggestions