Approximate hierarchical clustering via sparsest cut and spreading metrics

87Citations
Citations of this article
35Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Dasgupta recently introduced a cost function for the hierarchical clustering of a set of points given pairwise similarities between them. He showed that this function is NP-hard to optimize, but a top-down recursive partitioning heuristic based on an n-approximation algorithm for uniform sparsest cut gives an approximation of O(n log n) (the current best algorithm has n = O(p log n)). We show that the aforementioned sparsest cut heuristic in fact obtains an O(n)-approximation. The algorithm also applies to a generalized cost function studied by Dasgupta. Moreover, we obtain a strong inapproximability result, showing that the Hierarchical Clustering objective is hard to approximate to within any constant factor assuming the Small-Set Expansion (SSE) Hypothesis. Finally, we discuss approximation algorithms based on convex relaxations. We present a spreading metric SDP relaxation for the problem and show that it has integrality gap at most O( p log n). The advantage of the SDP relative to the sparsest cut heuristic is that it provides an explicit lower bound on the optimal solution and could potentially yield an even better approximation for hierarchical clustering. In fact our analysis of this SDP served as the inspiration for our improved analysis of the sparsest cut heuristic. We also show that a spreading metric LP relaxation gives an O(log n)-approximation.

Cite

CITATION STYLE

APA

Charikar, M., & Chatziafratis, V. (2017). Approximate hierarchical clustering via sparsest cut and spreading metrics. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (Vol. 0, pp. 841–854). Association for Computing Machinery. https://doi.org/10.1137/1.9781611974782.53

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free