Data integration strategy for robust classification of biomedical data

Aneta Polewko-Klim; Witold R. Rudnicki

Conference Proceedings

Data integration strategy for robust classification of biomedical data

Advances in Intelligent Systems and Computing (2020) 1160 AISC 596-606

DOI: 10.1007/978-3-030-45691-7_56

1Citations

1Readers

Get full text

Abstract

This paper presents the protocol for integration of data coming from two most common types of biological data (clinical and molecular) for more effective classification patients with cancer disease. In this protocol, the identification of the most informative features is performed by using statistical and information-theory based selection methods for molecular data and the Boruta algorithm for clinical data. Predictive models are built with the help of the Random Forest classification algorithm. The process of data integration includes combining the most informative clinical features and the synthetic features obtained from genetic marker models as input variables for classifier algorithms. We applied this classification protocol to METABRIC breast cancer samples. Clinical data, gene expression data and somatic copy number aberrations data were used for clinical endpoint prediction. We tested the various methods for combining from different dataset information. Our research shows that both types of molecular data contain features that relevant for clinical endpoint prediction. The best model was obtained by using ten clinical and two synthetic features obtained from biomarker models. In the examined cases, the type of filtration molecular markers had a small impact the predictive power of models even though the lists of top informative biomarkers are divergent.

Author supplied keywords

Cite

CITATION STYLE

APA

Polewko-Klim, A., & Rudnicki, W. R. (2020). Data integration strategy for robust classification of biomedical data. In Advances in Intelligent Systems and Computing (Vol. 1160 AISC, pp. 596–606). Springer. https://doi.org/10.1007/978-3-030-45691-7_56

Data integration strategy for robust classification of biomedical data

Abstract

Author supplied keywords

Cite

Register to see more suggestions