Minimum redundancy maximum relevance feature selection approach for temporal gene expression data

314Citations
Citations of this article
344Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Feature selection, aiming to identify a subset of features among a possibly large set of features that are relevant for predicting a response, is an important preprocessing step in machine learning. In gene expression studies this is not a trivial task for several reasons, including potential temporal character of data. However, most feature selection approaches developed for microarray data cannot handle multivariate temporal data without previous data flattening, which results in loss of temporal information. We propose a temporal minimum redundancy - maximum relevance (TMRMR) feature selection approach, which is able to handle multivariate temporal data without previous data flattening. In the proposed approach we compute relevance of a gene by averaging F-statistic values calculated across individual time steps, and we compute redundancy between genes by using a dynamical time warping approach. Results: The proposed method is evaluated on three temporal gene expression datasets from human viral challenge studies. Obtained results show that the proposed method outperforms alternatives widely used in gene expression studies. In particular, the proposed method achieved improvement in accuracy in 34 out of 54 experiments, while the other methods outperformed it in no more than 4 experiments. Conclusion: We developed a filter-based feature selection method for temporal gene expression data based on maximum relevance and minimum redundancy criteria. The proposed method incorporates temporal information by combining relevance, which is calculated as an average F-statistic value across different time steps, with redundancy, which is calculated by employing dynamical time warping approach. As evident in our experiments, incorporating the temporal information into the feature selection process leads to selection of more discriminative features.

Cite

CITATION STYLE

APA

Radovic, M., Ghalwash, M., Filipovic, N., & Obradovic, Z. (2017). Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinformatics, 18(1). https://doi.org/10.1186/s12859-016-1423-9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free