This paper presents a method for effectively detecting unknown patterns or clusters in high dimensional functional data. Examples of such data include gene expression levels measured over time from microarray experiments, functional magnetic resonance imaging (fMRI), mass spectrom-etry data from proteinomics, lipidomics etc. We define clusters through the unknown high dimensional multivariate distributions of all observations along each curve. Kullback-Leibler information and Mahalanobis generalized squared distance can fail to provide meaningful measure of distance between distributions in such high dimensional setting. We propose a new similarity measure and an agglomerative clustering algorithm, called PCLUST, to effectively differentiate among high dimensional populations. The algorithm produces invariant results under monotone transformations of data and does not require users to specify the number of clusters. Simulations show that PCLUST significantly out-performs 9 other popular algorithms in both clustering accuracy and robustness. An application in identifying biomark-ers using time course gene expression data from Arabidopsis in response to environmental stresses is illustrated.
CITATION STYLE
Miller, F., Neill, J., & Wang, H. (2008). Nonparametric clustering of functional data. Statistics and Its Interface, 1(1), 47–62. https://doi.org/10.4310/sii.2008.v1.n1.a5
Mendeley helps you to discover research relevant for your work.