Recent studies have demonstrated microarray expression data can be used to identify gene regulatory pathways. However, one of the major challenges is to utilize the large microarray data (genes and micro-RNAs) to have an efficient computational model. Therefore, there is an urgent need to reduce the dimensionality of these large sets using machine learning methods without compromising the accuracy. This requires an appropriate machine learning algorithm to select the significant features from these large datasets. Therefore, in this study, we use a supervised method based on a Random Forest to identify significant features from three microarray datasets from prenatal nicotine, alcohol, and nicotine and alcohol exposure groups in two different cell types (dopamine and non-dopamine neurons). Our approach was computationally efficient to reduce the dimensionality of extremely large microarray datasets. Furthermore, our results indicated that using only the top 20% of features was sufficient to confirm the genetic pathways previously identified when using all of the features in the model.
CITATION STYLE
Xia, H., Akay, Y. M., & Akay, M. (2021). Selecting Relevant Genes from Microarray Datasets Using a Random Forest Model. IEEE Access, 9, 97813–97821. https://doi.org/10.1109/ACCESS.2021.3092368
Mendeley helps you to discover research relevant for your work.