Frequent terms sets clustering method has been proposed to overcome hardship of high dimensionality, and finding meaningful labels for clusters. Although this method provides meaningful labels for clusters, it has low accuracy. In this research, candidate clusters are extracted by mining frequent terms set within documents dataset. Each document is assigned to these clusters with considering the value of supports. A new similarity measurement function for clusters is designed based on similarity and weight of clusters and is proposed to remove unwanted clusters in a noise reduction step. The proposed method operates based on the concept of terms sets, value of support and weight of each cluster. Experimental results show that our proposed method provides more accurate clusters in comparison with previous efforts done on "Re0" and "Hitech" datasets. © 2014 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Taheri, S., Sim, A. T. H., & Ghorashi, S. H. (2014). Document clustering based on a weighted exponential measurement. In Lecture Notes in Electrical Engineering (Vol. 279 LNEE, pp. 65–70). Springer Verlag. https://doi.org/10.1007/978-3-642-41674-3_10
Mendeley helps you to discover research relevant for your work.