A WordNet-based semantic model for enhancing text clustering

29Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Most of text mining techniques are based on word and/or phrase analysis of the text. The statistical analysis of a term (word or phrase) frequency captures the importance of the term within a document. However, to achieve a more accurate analysis, the underlying mining technique should indicate terms that capture the semantics of the text from which the importance of a term in a sentence and in the document can be derived. Incorporating semantic features from the WordNet lexical database is one of many approaches that have been tried to improve the accuracy of text clustering techniques. A new semantic-based model that analyzes documents based on their meaning is introduced. The proposed model analyzes terms and their corresponding synonyms and/or hypernyms on the sentence and document levels. In this model, if two documents contain different words and these words are semantically related, the proposed model can measure the semantic-based similarity between the two documents. The similarity between documents relies on a new semantic-based similarity measure which is applied to the matching concepts between documents. Experiments using the proposed semantic-based model in text clustering are conducted. Experimental results demonstrate that the newly developed semantic-based model enhances the clustering quality of sets of documents substantially. © 2009 IEEE.

Author supplied keywords

Cite

CITATION STYLE

APA

Shehata, S. (2009). A WordNet-based semantic model for enhancing text clustering. In ICDM Workshops 2009 - IEEE International Conference on Data Mining (pp. 477–482). https://doi.org/10.1109/ICDMW.2009.86

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free