A distributed tree-based ensemble learning approach for efficient structure prediction of protein

18Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

Abstract

Knowledge of a protein's secondary structure, in turn, contributes to our understanding of the functions of the protein is vital to many aspects of living organisms such as those of enzymes, hormones, and structural material, etc. It also helps in designing new drugs for critical disease. In this paper, we have advocated a distributed approach to identify the Protein Secondary Structures using an ensemble method on protein primary sequences. The Ensemble based Random Forest algorithm has been adopted to build the three-way predictive model. Based on the amino acid features of each protein and decision tree parameters, the classification model allows us to assign protein structures as 'α helix', 'β sheet', or a coil. Also the proposed model is implemented in a distributed computing environment, SPARK. Experiments have been carried out using cross-validation tests on RS126 and CB513 benchmark datasets. Our results clearly confirm that ensemble approach in classifying protein secondary structures scores better accuracy with improved performance when it will be implemented in the distributed environment.

Cite

CITATION STYLE

APA

Xavier, L. D., & Thirunavukarasu, R. (2017). A distributed tree-based ensemble learning approach for efficient structure prediction of protein. International Journal of Intelligent Engineering and Systems, 10(3), 226–234. https://doi.org/10.22266/ijies2017.0630.25

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free