MixkMeans: Clustering question-answer archives

9Citations
Citations of this article
91Readers
Mendeley users who have this article in their library.

Abstract

Community-driven Question Answering (CQA) systems that crowdsource experiential information in the form of questions and answers and have accumulated valuable reusable knowledge. Clustering of QA datasets from CQA systems provides a means of organizing the content to ease tasks such as manual curation and tagging. In this paper, we present a clustering method that exploits the two-part question-answer structure in QA datasets to improve clustering quality. Our method, MixKMeans, composes question and answer space similarities in a way that the space on which the match is higher is allowed to dominate. This construction is motivated by our observation that semantic similarity between question-answer data (QAs) could get localized in either space. We empirically evaluate our method on a variety of real-world labeled datasets. Our results indicate that our method significantly outperforms state-of-the-art clustering methods for the task of clustering question-answer archives.

Cite

CITATION STYLE

APA

Deepak, P. (2016). MixkMeans: Clustering question-answer archives. In EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 1576–1585). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d16-1164

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free