Concept based clustering of documents with missing semantic information

E. Anupriya; N. Ch S.N. Iyengar

Conference Proceedings

Concept based clustering of documents with missing semantic information

Advances in Intelligent Systems and Computing (2014) 243 579-589

DOI: 10.1007/978-81-322-1665-0_57

1Citations

1Readers

Get full text

Abstract

Today, every new document added to the Web is augmented with semantic information (i.e., information about the content) which identifies the class of the document. The information is either added as keywords, or implicitly known from structural information like title, body text, or added as objects and their relationship (rich data format). But, the documents that enriched the Web five or ten years back do not contain semantic information. The objective of this paper is to cluster documents with missing semantic information. It is performed by adopting frequent term-based method exploiting the lexical and structural relation between keywords in the document. Similarity histogram clustering algorithm has been used to cluster the documents after deriving semantic information on concepts which identifies the class of the document. The results illustrate that the concept-based clustering performs well compared to statistical clustering k-means but suffers from proper subset selection of frequent terms.

Author supplied keywords

Cite

CITATION STYLE

APA

Anupriya, E., & Iyengar, N. C. S. N. (2014). Concept based clustering of documents with missing semantic information. In Advances in Intelligent Systems and Computing (Vol. 243, pp. 579–589). Springer Verlag. https://doi.org/10.1007/978-81-322-1665-0_57

Concept based clustering of documents with missing semantic information

Abstract

Author supplied keywords

Cite

Register to see more suggestions