A new test system for stability measurement of marker gene selection in DNA microarray data analysis

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Microarray gene expression data contains informative features that reflect the critical processes controlling prominent biological functions. Feature selection algorithms have been used in previous biomedical research to find the "marker" genes whose expression value change corresponds to the most eminent difference between specimen classes. One problem encountered in such analysis is the imbalance between very large numbers of genes versus relatively fewer specimen samples. A common concern, therefore, is "overfitting" the data and deriving a set of marker genes with low stability over the entire set of possible specimens. To address this problem, we propose a new test environment in which synthetic data is perturbed to simulate possible variations in gene expression values. The goal is for the generated data to have appropriate properties that match natural data, and that are appropriate for use in testing the sensitivity of feature selection algorithms and validating the robustness of selected marker genes. In this paper, we evaluate a statistically-based resampling approach and a Principal Components Analysis (PCA)-based linear noise distribution approach. Our results show that both methods generate reasonable synthetic data and that the signal/noise rate (with variation weights at 5%, 10%, 20% and 30%) measurably impacts the classification accuracy and the marker genes selected. Based on these results, we identify the most appropriate marker gene selection and classification techniques for each type and level of noise we modeled. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Xiong, F., Huang, H., Ford, J., Makedon, F. S., & Pearlman, J. D. (2005). A new test system for stability measurement of marker gene selection in DNA microarray data analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3746 LNCS, pp. 437–447). https://doi.org/10.1007/11573036_41

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free