KwikBucks: Correlation Clustering with Cheap-Weak and Expensive-Strong Signals

4Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

For text clustering, there is often a dilemma: one can either first embed each examples independently and then compute pair-wise similarities based on the embeddings, or use a crossattention model that takes a pair of examples as input and produces a similarity. The former is more scalable but the similarities often have lower quality, whereas the latter does not scale well but produces higher quality similarities. We address this dilemma by developing a clustering algorithm that leverages the best of both worlds: the scalability of former and the quality of the latter. We formulate the problem of text clustering with embeddingbased and cross-attention models as a novel version of the Budgeted Correlation Clustering problem (BCC) where along with a limited number of queries to an expensive oracle (a cross-attention model in our case), we have unlimited access to a cheaper but less accurate second oracle (embedding similarities in our case). We develop a theoretically motivated algorithm that leverages the cheap oracle to judiciously query the strong oracle while maintaining high clustering quality. We empirically demonstrate gains in query minimization and clustering metrics on a variety of datasets with diverse strong and cheap oracles.

Cite

CITATION STYLE

APA

Silwal, S., Ahmadian, S., Nystrom, A., McCallum, A., Ramachandran, D., & Kazemi, M. (2023). KwikBucks: Correlation Clustering with Cheap-Weak and Expensive-Strong Signals. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 1–31). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.sustainlp-1.1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free