Statistical estimation of cluster boundaries in gene expression profile data

46Citations
Citations of this article
35Readers
Mendeley users who have this article in their library.

Abstract

Motivation: Gene expression profile data are rapidly accumulating due to advances in microarray techniques. The abundant data are analyzed by clustering procedures to extract the useful information about the genes inherent in the data. In the clustering analyses, the systematic determination of the boundaries of gene clusters, instead of by visual inspection and biological knowledge, still remains challenging. Results: We propose a statistical procedure to estimate the number of clusters in the hierarchical clustering of the expression profiles. Following the hierarchical clustering, the statistical property of the profiles at the node in the dendrogram is evaluated by a statistics-based value: the variance inflation factor in the multiple regression analysis. The evaluation leads to an automatic determination of the cluster boundaries without any additional analyses and any biological knowledge of the measured genes. The performance of the present procedure is demonstrated on the profiles of 2467 yeast genes, with very promising results.

Cite

CITATION STYLE

APA

Horimoto, K., & Toh, H. (2002). Statistical estimation of cluster boundaries in gene expression profile data. Bioinformatics, 17(12), 1143–1151. https://doi.org/10.1093/bioinformatics/17.12.1143

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free