Coding of data, usually upstream of data analysis, has crucial impli- cations for the data analysis results. By modifying the data coding – through use of less than full precision in data values – we can aid appre- ciably the effectiveness and efficiency of the hierarchical clustering. In our first application, this is used to lessen the quantity of data to be hierar- chically clustered. The approach is a hybrid one, based on hashing and on the Ward minimum variance agglomerative criterion. In our second appli- cation, we derive a hierarchical clustering from relationships between sets of observations, rather than the traditional use of relationships between the observations themselves. This second application uses embedding in a Baire space, or longest common prefix ultrametric space. We compare this second approach, which is of O(n log n) complexity, to k-means.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below