Comparison of Distance Measures in Cluster Analysis with Dichotomous Data

  • Finch H
N/ACitations
Citations of this article
330Readers
Mendeley users who have this article in their library.

Abstract

The current study examines the performance of cluster analysis with dichotomous data using distance measures based on response pattern similarity. In many contexts, such as educational and psychological testing, cluster analysis is a useful means for exploring datasets and identifying un- derlying groups among individuals. However, standard approaches to cluster analysis assume that the variables used to group observations are continu- ous in nature. This paper focuses on four methods for calculating distance between individuals using dichotomous data, and the subsequent introduc- tion of these distances to a clustering algorithm such as Ward’s. The four methods in question, are potentially useful for practitioners because they are relatively easy to carry out using standard statistical software such as SAS and SPSS, and have been shown to have potential for correctly grouping ob- servations based on dichotomous data. Results of both a simulation study and application to a set of binary survey responses show that three of the four measures behave similarly, and can yield correct cluster recovery rates of between 60% and 90%. Furthermore, these methods were found to work better, in nearly all cases, than using the raw data with Ward’s clustering algorithm.

Cite

CITATION STYLE

APA

Finch, H. (2021). Comparison of Distance Measures in Cluster Analysis with Dichotomous Data. Journal of Data Science, 3(1), 85–100. https://doi.org/10.6339/jds.2005.03(1).192

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free