Document clustering based on a weighted exponential measurement

Shahrooz Taheri; Alex Tze Hiang Sim; Seyed Hamid Ghorashi

Conference Proceedings

Document clustering based on a weighted exponential measurement

Lecture Notes in Electrical Engineering (2014) 279 LNEE 65-70

DOI: 10.1007/978-3-642-41674-3_10

0Citations

1Readers

Get full text

Abstract

Frequent terms sets clustering method has been proposed to overcome hardship of high dimensionality, and finding meaningful labels for clusters. Although this method provides meaningful labels for clusters, it has low accuracy. In this research, candidate clusters are extracted by mining frequent terms set within documents dataset. Each document is assigned to these clusters with considering the value of supports. A new similarity measurement function for clusters is designed based on similarity and weight of clusters and is proposed to remove unwanted clusters in a noise reduction step. The proposed method operates based on the concept of terms sets, value of support and weight of each cluster. Experimental results show that our proposed method provides more accurate clusters in comparison with previous efforts done on "Re0" and "Hitech" datasets. © 2014 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Taheri, S., Sim, A. T. H., & Ghorashi, S. H. (2014). Document clustering based on a weighted exponential measurement. In Lecture Notes in Electrical Engineering (Vol. 279 LNEE, pp. 65–70). Springer Verlag. https://doi.org/10.1007/978-3-642-41674-3_10

Document clustering based on a weighted exponential measurement

Abstract

Author supplied keywords

Cite

Register to see more suggestions