Statistical inference for simultaneous clustering of gene expression data

43Citations
Citations of this article
45Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Current methods for analysis of gene expression data are mostly based on clustering and classification of either genes or samples. We offer support for the idea that more complex patterns can be identified in the data if genes and samples are considered simultaneously. We formalize the approach and propose a statistical framework for two-way clustering. A simultaneous clustering parameter is defined as a function θ = Φ(P) of the true data generating distribution P, and an estimate is obtained by applying this function to the empirical distribution Pn. We illustrate that a wide range of clustering procedures, including generalized hierarchical methods, can be defined as parameters which are compositions of individual mappings for clustering patients and genes. This framework allows one to assess classical properties of clustering methods, such as consistency, and to formally study statistical inference regarding the clustering parameter. We present results of simulations designed to assess the asymptotic validity of different bootstrap methods for estimating the distribution of Φ(Pn). The method is illustrated on a publicly available data set. © 2002 Published by Elsevier Science Inc.

Cite

CITATION STYLE

APA

Pollard, K. S., & Van der Laan, M. J. (2002). Statistical inference for simultaneous clustering of gene expression data. Mathematical Biosciences, 176(1), 99–121. https://doi.org/10.1016/S0025-5564(01)00116-X

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free