Application of an interpretable classification model on Early Folding Residues during protein folding 08 Information and Computing Sciences 0801 Artificial Intelligence and Image Processing

Sebastian Bittrich; Marika Kaden; Christoph Leberecht; Florian Kaiser; Thomas Villmann; Dirk Labudde

Journal ArticleOPEN ACCESS

Application of an interpretable classification model on Early Folding Residues during protein folding 08 Information and Computing Sciences 0801 Artificial Intelligence and Image Processing

BioData Mining (2019) 12(1)

DOI: 10.1186/s13040-018-0188-2

7Citations

28Readers

Abstract

Background: Machine learning strategies are prominent tools for data analysis. Especially in life sciences, they have become increasingly important to handle the growing datasets collected by the scientific community. Meanwhile, algorithms improve in performance, but also gain complexity, and tend to neglect interpretability and comprehensiveness of the resulting models. Results: Generalized Matrix Learning Vector Quantization (GMLVQ) is a supervised, prototype-based machine learning method and provides comprehensive visualization capabilities not present in other classifiers which allow for a fine-grained interpretation of the data. In contrast to commonly used machine learning strategies, GMLVQ is well-suited for imbalanced classification problems which are frequent in life sciences. We present a Weka plug-in implementing GMLVQ. The feasibility of GMLVQ is demonstrated on a dataset of Early Folding Residues (EFR) that have been shown to initiate and guide the protein folding process. Using 27 features, an area under the receiver operating characteristic of 76.6% was achieved which is comparable to other state-of-the-art classifiers. The obtained model is accessible at https://biosciences.hs-mittweida.de/efpred/. Conclusions: The application on EFR prediction demonstrates how an easy interpretation of classification models can promote the comprehension of biological mechanisms. The results shed light on the special features of EFR which were reported as most influential for the classification: EFR are embedded in ordered secondary structure elements and they participate in networks of hydrophobic residues. Visualization capabilities of GMLVQ are presented as we demonstrate how to interpret the results.

Author supplied keywords

Cite

CITATION STYLE

APA

Bittrich, S., Kaden, M., Leberecht, C., Kaiser, F., Villmann, T., & Labudde, D. (2019). Application of an interpretable classification model on Early Folding Residues during protein folding 08 Information and Computing Sciences 0801 Artificial Intelligence and Image Processing. BioData Mining, 12(1). https://doi.org/10.1186/s13040-018-0188-2

Application of an interpretable classification model on Early Folding Residues during protein folding 08 Information and Computing Sciences 0801 Artificial Intelligence and Image Processing

Abstract

Author supplied keywords

Cite

Register to see more suggestions