Feature selection in omics prediction problems using cat scores and false nondiscovery rate control

82Citations
Citations of this article
117Readers
Mendeley users who have this article in their library.

Abstract

We revisit the problem of feature selection in linear discriminant analysis (LDA), that is, when features are correlated. First, we introduce a pooled centroids formulation of the multiclass LDA predictor function, in which the relative weights of Mahalanobis-transformed predictors are given by correlation-adjusted t-scores (cat scores). Second, for feature selection we propose thresholding cat scores by controlling false nondiscovery rates (FNDR). Third, training of the classifier is based on James-Stein shrinkage estimates of correlations and variances, where regularization parameters are chosen analytically without resampling. Overall, this results in an effective and computationally inexpensive framework for high-dimensional prediction with natural feature selection. The proposed shrinkage discriminant procedures are implemented in the R package "sda" available from the R repository CRAN. © 2010 Institute of Mathematical Statistics.

Cite

CITATION STYLE

APA

Ahdesmäki, M., & Strimmer, K. (2012). Feature selection in omics prediction problems using cat scores and false nondiscovery rate control. Annals of Applied Statistics, 6(1), 503–519. https://doi.org/10.1214/09-AOAS277

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free