Selecting Relevant Genes from Microarray Datasets Using a Random Forest Model

5Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

Recent studies have demonstrated microarray expression data can be used to identify gene regulatory pathways. However, one of the major challenges is to utilize the large microarray data (genes and micro-RNAs) to have an efficient computational model. Therefore, there is an urgent need to reduce the dimensionality of these large sets using machine learning methods without compromising the accuracy. This requires an appropriate machine learning algorithm to select the significant features from these large datasets. Therefore, in this study, we use a supervised method based on a Random Forest to identify significant features from three microarray datasets from prenatal nicotine, alcohol, and nicotine and alcohol exposure groups in two different cell types (dopamine and non-dopamine neurons). Our approach was computationally efficient to reduce the dimensionality of extremely large microarray datasets. Furthermore, our results indicated that using only the top 20% of features was sufficient to confirm the genetic pathways previously identified when using all of the features in the model.

Cite

CITATION STYLE

APA

Xia, H., Akay, Y. M., & Akay, M. (2021). Selecting Relevant Genes from Microarray Datasets Using a Random Forest Model. IEEE Access, 9, 97813–97821. https://doi.org/10.1109/ACCESS.2021.3092368

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free