Flexible Data Trimming for Different Machine Learning Methods in Omics-Based Personalized Oncology

0Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Machine learning (ML) methods are still rarely used for gene expression/mutation-based prediction of individual tumor responses on anticancer chemotherapy due to relatively rare clinical case histories supplemented with high-throughput molecular data. This leads to high vulnerability of most ML methods are to overtraining. Recently, we proposed a novel hybrid global-local approach to ML termed FLOating Window Projective Separator (FloWPS) that avoids extrapolation in the feature space and may improve robustness of classifiers even for datasets with limited number of preceding cases. FloWPS has been validated for the support vector machines (SVM) method, where if significantly improved the quality of classifiers. The core property of FloWPS is data trimming, i.e. sample-specific removal of features. The irrelevant features in a sample that don’t have significant number of neighboring hits in the training dataset are removed from further analyses. In addition, for each point of a validation dataset, only the proximal points of the training dataset are taken into account. Thus, for every point of a validation dataset, the training dataset is adjusted to form a floating window. Here, we applied this approach to seven popular ML methods, including SVM, k nearest neighbors (kNN), random forest (RF), Tikhonov (ridge) regression (RR), binomial naïve Bayes (BNB), adaptive boosting (ADA) and multi-layer perceptron (MLP). We performed computational experiments for 21 high throughput clinically annotated gene expression datasets totally including 1778 cancer patients who either responded or not on chemotherapy treatments. The biggest dataset had samples for 235, whereas the smallest for 41 individual cases. For global ML methods, such as SVM, RF, BNB, ADA and MLP, FloWPS essentially improved the classifier quality. Namely, the area under the receiver-operator curve (ROC AUC) for the responder vs non-responder classifier, increased from typical range 0.65–0.85 to 0.80–0.95, respectively. On the other hand, FloWPS was shown useless for purely local ML techniques such as kNN method or RR. However, both these local methods exhibited low sensitivity or specificity in cases when false positive or false negative errors, respectively, should be avoided. According to sensitivity-specificity criterion, for all the datasets tested, the best performance in combination with FloWPS data trimming was shown for the binomial naïve Bayesian method, which can be valuable for further development of predictors in personalized oncology.

Cite

CITATION STYLE

APA

Tkachev, V., Buzdin, A., & Borisov, N. (2019). Flexible Data Trimming for Different Machine Learning Methods in Omics-Based Personalized Oncology. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11826 LNCS, pp. 62–71). Springer. https://doi.org/10.1007/978-3-030-35210-3_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free