Concept based clustering of documents with missing semantic information

1Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Today, every new document added to the Web is augmented with semantic information (i.e., information about the content) which identifies the class of the document. The information is either added as keywords, or implicitly known from structural information like title, body text, or added as objects and their relationship (rich data format). But, the documents that enriched the Web five or ten years back do not contain semantic information. The objective of this paper is to cluster documents with missing semantic information. It is performed by adopting frequent term-based method exploiting the lexical and structural relation between keywords in the document. Similarity histogram clustering algorithm has been used to cluster the documents after deriving semantic information on concepts which identifies the class of the document. The results illustrate that the concept-based clustering performs well compared to statistical clustering k-means but suffers from proper subset selection of frequent terms.

Cite

CITATION STYLE

APA

Anupriya, E., & Iyengar, N. C. S. N. (2014). Concept based clustering of documents with missing semantic information. In Advances in Intelligent Systems and Computing (Vol. 243, pp. 579–589). Springer Verlag. https://doi.org/10.1007/978-81-322-1665-0_57

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free