A distributed tree-based ensemble learning approach for efficient structure prediction of protein

Leo Dencelin Xavier; Ramkumar Thirunavukarasu

Journal ArticleOPEN ACCESS

A distributed tree-based ensemble learning approach for efficient structure prediction of protein

International Journal of Intelligent Engineering and Systems (2017) 10(3) 226-234

DOI: 10.22266/ijies2017.0630.25

18Citations

16Readers

Abstract

Knowledge of a protein's secondary structure, in turn, contributes to our understanding of the functions of the protein is vital to many aspects of living organisms such as those of enzymes, hormones, and structural material, etc. It also helps in designing new drugs for critical disease. In this paper, we have advocated a distributed approach to identify the Protein Secondary Structures using an ensemble method on protein primary sequences. The Ensemble based Random Forest algorithm has been adopted to build the three-way predictive model. Based on the amino acid features of each protein and decision tree parameters, the classification model allows us to assign protein structures as 'α helix', 'β sheet', or a coil. Also the proposed model is implemented in a distributed computing environment, SPARK. Experiments have been carried out using cross-validation tests on RS126 and CB513 benchmark datasets. Our results clearly confirm that ensemble approach in classifying protein secondary structures scores better accuracy with improved performance when it will be implemented in the distributed environment.

Author supplied keywords

Cite

CITATION STYLE

APA

Xavier, L. D., & Thirunavukarasu, R. (2017). A distributed tree-based ensemble learning approach for efficient structure prediction of protein. International Journal of Intelligent Engineering and Systems, 10(3), 226–234. https://doi.org/10.22266/ijies2017.0630.25

A distributed tree-based ensemble learning approach for efficient structure prediction of protein

Abstract

Author supplied keywords

Cite

Register to see more suggestions