Prognostically Relevant Subtypes and Survival Prediction for Breast Cancer Based on Multimodal Genomics Data

31Citations
Citations of this article
63Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Cancer is one of the deadliest diseases caused by abnormal behaviors of genes that control the cell division and growth. Genomics data and clinical outcomes from multiplatform and heterogeneous sources are used to make clinical decisions for the cancer patients, where both multimodality and heterogeneity impose significant challenges to bioinformatics tools and algorithms. Numerous works have been proposed to overcome these challenges by using sophisticated bioinformatics and machine learning algorithms as either primary or supporting tools. In this paper, we propose a new approach to analyze genomics data from The Cancer Genome Atlas (TCGA) to classify breast cancer patients based on their subtypes and survival rates. Since multiple factors such as estrogen receptor (ER), progesterone receptor (PGR), and human epidermal growth factor receptor 2 (HER2) statuses are involved in breast cancer diagnosis, we used DNA methylation, gene expression (GE), and miRNA expression data by creating a multiplatform network called Multimodal Autoencoders (MAE) classifier to support each data type. Experiment results demonstrate that our approach is promising with high confidence for predicting both breast cancer subtypes and survival rates. In particular, we achieved state-of-the-art results with accuracies of 91% and 86%, respectively for the ER and PGR-based subtype prediction and moderately low accuracy for the HER2-based subtype prediction as well as we perceived reasonably low MSE and positive coefficient of determination ( $R^{2}$ ) scores in case of survival prediction. Additionally, we created unimodal and multimodal features from each input type and trained decision tree (DT), Naive Bayes (NB), K-nearest neighbors (KNN), logistic regression (LR), support vector machine (SVM), random forest (RF), and gradient boosting trees (GBT) as ML baseline models. Finally, we use the model averaging ensemble of top-3 models to report the final prediction.

Cite

CITATION STYLE

APA

Karim, M. R., Wicaksono, G., Costa, I. G., Decker, S., & Beyan, O. (2019). Prognostically Relevant Subtypes and Survival Prediction for Breast Cancer Based on Multimodal Genomics Data. IEEE Access, 7, 133850–133864. https://doi.org/10.1109/ACCESS.2019.2941796

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free