We propose an algorithm to effectively cluster a specific type of text documents: textual responses gathered through a survey system. Due to the peculiar features exhibited in such responses (e.g., short in length, rich in outliers, and diverse in categories), traditional unsupervised and semi-supervised clustering*techniques are challenged to achieve satisfactory performance as demanded by a survey task. We address this issue by proposing a semi-supervised, topic-driven approach. It first employs an unsupervised algorithm to generate a preliminary clustering schema for all the answers to a question. A human expert then uses this schema to identify the major topics in these answers. Finally, a topic-driven clustering algorithm is adopted to obtain an improved clustering schema. We evaluated this approach using five questions in a survey we recently conducted in the U.S. The results demonstrate that this approach can lead to significant improvement in clustering quality. © 2009 Springer.
CITATION STYLE
Yang, H., Mysore, A., & Wallace, S. (2009). A semi-supervised topic-driven approach for clustering textual answers to survey questions. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5678 LNAI, pp. 374–385). https://doi.org/10.1007/978-3-642-03348-3_36
Mendeley helps you to discover research relevant for your work.