A hierarchical clustering approach based on a set of PLS models is presented. Called PLS-Trees®, this approach is analogous to classification and regression trees (CART), but uses the scores of PLS regression models as the basis for splitting the clusters, instead of the individual X-variables. The split of one cluster into two is made along the sorted first X-score (t1) of a PLS model of the cluster, but may potentially be made along a direction corresponding to a combination of scores. The position of the split is selected according to the improvement of a weighted combination of (a) the variance of the X-score, (b) the variance of Yand (c) a penalty function discouraging an unbalanced split with very different numbers of observations. Cross-validation is used to terminate the branches of the tree, and to determine the number of components of each cluster PLS model. Some obvious extensions of the approach to OPLS-Trees and trees based on hierarchical PLS or OPLS models with the variables divided in blocks depending on their type, are also mentioned. The possibility to greatly reduce the number of variables in each PLS model on the basis of their PLS w-coefficients is also pointed out. The approach is illustrated by means of three examples. The first two examples are quantitative structure-activity relationship (QSAR) data sets, while the third is based on hyper-spectral images of liver tissue for identifying different sources of variability in the liver samples. © 2009 John Wiley & Sons, Ltd.
CITATION STYLE
Eriksson, L., Trygg, J., & Wold, S. (2009). PLS-Trees®, a top-down clustering approach. Journal of Chemometrics, 23(11), 569–580. https://doi.org/10.1002/cem.1254
Mendeley helps you to discover research relevant for your work.