Selecting Relevant Genes from Microarray Datasets Using a Random Forest Model

Hui Xia; Yasemin M. Akay; Metin Akay

Journal ArticleOPEN ACCESS

Selecting Relevant Genes from Microarray Datasets Using a Random Forest Model

IEEE Access (2021) 9 97813-97821

DOI: 10.1109/ACCESS.2021.3092368

5Citations

13Readers

Abstract

Recent studies have demonstrated microarray expression data can be used to identify gene regulatory pathways. However, one of the major challenges is to utilize the large microarray data (genes and micro-RNAs) to have an efficient computational model. Therefore, there is an urgent need to reduce the dimensionality of these large sets using machine learning methods without compromising the accuracy. This requires an appropriate machine learning algorithm to select the significant features from these large datasets. Therefore, in this study, we use a supervised method based on a Random Forest to identify significant features from three microarray datasets from prenatal nicotine, alcohol, and nicotine and alcohol exposure groups in two different cell types (dopamine and non-dopamine neurons). Our approach was computationally efficient to reduce the dimensionality of extremely large microarray datasets. Furthermore, our results indicated that using only the top 20% of features was sufficient to confirm the genetic pathways previously identified when using all of the features in the model.

Author supplied keywords

Cite

CITATION STYLE

APA

Xia, H., Akay, Y. M., & Akay, M. (2021). Selecting Relevant Genes from Microarray Datasets Using a Random Forest Model. IEEE Access, 9, 97813–97821. https://doi.org/10.1109/ACCESS.2021.3092368

Selecting Relevant Genes from Microarray Datasets Using a Random Forest Model

Abstract

Author supplied keywords

Cite

Register to see more suggestions