Comparison of Distance Measures in Cluster Analysis with Dichotomous Data

Holmes Finch

Journal ArticleOPEN ACCESS

Comparison of Distance Measures in Cluster Analysis with Dichotomous Data

Finch H

Journal of Data Science (2021) 3(1) 85-100

DOI: 10.6339/jds.2005.03(1).192

N/ACitations

330Readers

Abstract

The current study examines the performance of cluster analysis with dichotomous data using distance measures based on response pattern similarity. In many contexts, such as educational and psychological testing, cluster analysis is a useful means for exploring datasets and identifying un- derlying groups among individuals. However, standard approaches to cluster analysis assume that the variables used to group observations are continu- ous in nature. This paper focuses on four methods for calculating distance between individuals using dichotomous data, and the subsequent introduc- tion of these distances to a clustering algorithm such as Ward’s. The four methods in question, are potentially useful for practitioners because they are relatively easy to carry out using standard statistical software such as SAS and SPSS, and have been shown to have potential for correctly grouping ob- servations based on dichotomous data. Results of both a simulation study and application to a set of binary survey responses show that three of the four measures behave similarly, and can yield correct cluster recovery rates of between 60% and 90%. Furthermore, these methods were found to work better, in nearly all cases, than using the raw data with Ward’s clustering algorithm.

Cite

CITATION STYLE

APA

Finch, H. (2021). Comparison of Distance Measures in Cluster Analysis with Dichotomous Data. Journal of Data Science, 3(1), 85–100. https://doi.org/10.6339/jds.2005.03(1).192

Comparison of Distance Measures in Cluster Analysis with Dichotomous Data

Abstract

Cite

Register to see more suggestions