A fast and effective method for clustering large-scale Chinese question dataset

Xiaodong Zhang; Houfeng Wang

Conference Proceedings

A fast and effective method for clustering large-scale Chinese question dataset

Communications in Computer and Information Science (2014) 496 345-356

DOI: 10.1007/978-3-662-45924-9_31

0Citations

6Readers

Get full text

Abstract

Question clustering plays an important role in QA systems. Due to data sparseness and lexical gap in questions, there is no sufficient information to guarantee good clustering results. Besides, previous works pay little attention to the complexity of algorithms, resulting in infeasibility on large-scale datasets. In this paper, we propose a novel similarity measure, which employs word relatedness as additional information to help calculating similarity between questions. Based on the similarity measure and k-means algorithm, semantic k-means algorithm and its extended version are proposed. Experimental results show that the proposed methods have comparable performance with state-of-the- art methods and cost less time.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhang, X., & Wang, H. (2014). A fast and effective method for clustering large-scale Chinese question dataset. In Communications in Computer and Information Science (Vol. 496, pp. 345–356). Springer Verlag. https://doi.org/10.1007/978-3-662-45924-9_31

A fast and effective method for clustering large-scale Chinese question dataset

Abstract

Author supplied keywords

Cite

Register to see more suggestions