A novel weighting scheme applied to improve the text document clustering techniques

41Citations
Citations of this article
23Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Text clustering is an efficient analysis technique used in the domain of the text mining to arrange a huge of unorganized text documents into a subset of coherent clusters. Where, the similar documents in the same cluster. In this paper, we proposed a novel term weighting scheme, namely, length feature weight (LFW), to improve the text document clustering algorithms based on new factors. The proposed scheme assigns a favorable term weight according to the obtained information from the documents collection. It recognizes the terms which are particular to each cluster and enhances their weights based on the proposed factors at the level of the document. β-hill climbing technique is used to validate the proposed scheme in the text clustering. The proposed weight scheme is compared with the existing weight scheme (TF-IDF) to validate its results in that domain. Experiments are conducted on eight standard benchmark text datasets taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed weighting scheme LFW overcomes the existing weighting scheme and enhances the result of text document clustering technique in terms of the F-measure, precision, and recall.

Cite

CITATION STYLE

APA

Abualigah, L. M., Khader, A. T., & Hanandeh, E. S. (2018). A novel weighting scheme applied to improve the text document clustering techniques. In Studies in Computational Intelligence (Vol. 741, pp. 305–320). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-66984-7_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free