On testing the significance of sets of genes

  • Efron B
  • Tibshirani R
N/ACitations
Citations of this article
661Readers
Mendeley users who have this article in their library.

Abstract

This paper discusses the problem of identifying differentially expressed groups of genes from a microarray experiment. The groups of genes are ex-ternally defined, for example, sets of gene pathways derived from biological databases. Our starting point is the interesting Gene Set Enrichment Analy-sis (GSEA) procedure of Subramanian et al. [Proc. Natl. Acad. Sci. USA 102 (2005) 15545–15550]. We study the problem in some generality and propose two potential improvements to GSEA: the maxmean statistic for summarizing gene-sets, and restandardization for more accurate inferences. We discuss a variety of examples and extensions, including the use of gene-set scores for class predictions. We also describe a new R language package GSA that im-plements our ideas. 1. Introduction. We discuss the problem of identifying differentially ex-pressed groups of genes from a set of microarray experiments. In the usual situation we have N genes measured on n microarrays, under two different ex-perimental conditions, such as control and treatment. The number of genes N is usually large, say, at least a few thousand, while the number samples n is smaller, say, a hundred or fewer. This problem is an example of multiple hypothesis testing with a large number of tests, one that often arises in genomic and proteomic ap-plications, and also in signal processing. We focus mostly on the gene expression problem, but our proposed methods are more widely applicable. Most approaches start by computing a two-sample t-statistic z j for each gene. Genes having t-statistics larger than a pre-defined cutoff (in absolute value) are declared significant, and then the family-wise error rate or false discovery rate of the resulting gene list is assessed by comparing the tail area from a null distribution of the statistic. This null distribution is derived from data permutations, or from asymptotic theory. In an interesting and useful paper, Subramanian et al. (2005) proposed a method called Gene Set Enrichment Analysis (GSEA) for assessing the significance of pre-defined gene-sets, rather than individual genes. The gene-sets can be derived from different sources, for example, the sets of genes representing biological pathways

Cite

CITATION STYLE

APA

Efron, B., & Tibshirani, R. (2007). On testing the significance of sets of genes. The Annals of Applied Statistics, 1(1). https://doi.org/10.1214/07-aoas101

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free