Finding Similar Documents Using Different Clustering Techniques

Sumayia Al-Anazi; Hind Almahmoud; Isra Al-Turaiki

Conference ProceedingsOPEN ACCESS

Finding Similar Documents Using Different Clustering Techniques

Procedia Computer Science (2016) 82 28-34

DOI: 10.1016/j.procs.2016.04.005

50Citations

124Readers

Abstract

Text clustering is an important application of data mining. It is concerned with grouping similar text documents together. In this paper, several models are built to cluster capstone project documents using three clustering techniques: k-means, k-means fast, and k-medoids. Our datatset is obtained from the library of the College of Computer and Information Sciences, King Saud University, Riyadh. Three similarity measure are tested: cosine similarity, Jaccard similarity, and Correlation Coefficient. The quality of the obtained models is evaluated and compared. The results indicate that the best performance is achieved using k-means and k-medoids combined with cosine similarity. We observe variation in the quality of clustering based on the evaluation measure used. In addition, as the value of k increases, the quality of the resulting cluster improves. Finally, we reveal the categories of graduation projects offered in the Information Technology department for female students.

Author supplied keywords

Cite

CITATION STYLE

APA

Al-Anazi, S., Almahmoud, H., & Al-Turaiki, I. (2016). Finding Similar Documents Using Different Clustering Techniques. In Procedia Computer Science (Vol. 82, pp. 28–34). Elsevier B.V. https://doi.org/10.1016/j.procs.2016.04.005

Finding Similar Documents Using Different Clustering Techniques

Abstract

Author supplied keywords

Cite

Register to see more suggestions