Community-driven Question Answering (CQA) systems that crowdsource experiential information in the form of questions and answers and have accumulated valuable reusable knowledge. Clustering of QA datasets from CQA systems provides a means of organizing the content to ease tasks such as manual curation and tagging. In this paper, we present a clustering method that exploits the two-part question-answer structure in QA datasets to improve clustering quality. Our method, MixKMeans, composes question and answer space similarities in a way that the space on which the match is higher is allowed to dominate. This construction is motivated by our observation that semantic similarity between question-answer data (QAs) could get localized in either space. We empirically evaluate our method on a variety of real-world labeled datasets. Our results indicate that our method significantly outperforms state-of-the-art clustering methods for the task of clustering question-answer archives.
CITATION STYLE
Deepak, P. (2016). MixkMeans: Clustering question-answer archives. In EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 1576–1585). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d16-1164
Mendeley helps you to discover research relevant for your work.