Document mining based on semantic understanding of text

Khaled Shaban; Otman Basir; Mohamed Kamel

Conference ProceedingsOPEN ACCESS

Document mining based on semantic understanding of text

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4225 LNCS 834-843

DOI: 10.1007/11892755_86

5Citations

14Readers

Abstract

This paper presents a new paradigm for mining documents by exploiting the semantic information of their texts. A formal semantic representation of linguistic inputs is introduced and utilized to build a semantic representation for documents. The representation is constructed through accumulation of syntactic and semantic analysis outputs. A new distance measure is developed to determine the similarities between contents of documents. The measure is based on inexact matching of attributed trees. It involves the computation of all distinct similarity common sub-trees, and can be computed efficiently. It is believed that the proposed representation along with the proposed similarity measure will enable more effective document mining processes. The proposed techniques to mine documents were implemented as components in a mining system. A case study of semantic document clustering is presented to demonstrate the working and the efficacy of the framework. Experimental work is reported, and its results are presented and analyzed. © Springer-Verlag Berlin Heidelberg 2006.

Author supplied keywords

Cite

CITATION STYLE

APA

Shaban, K., Basir, O., & Kamel, M. (2006). Document mining based on semantic understanding of text. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4225 LNCS, pp. 834–843). Springer Verlag. https://doi.org/10.1007/11892755_86

Document mining based on semantic understanding of text

Abstract

Author supplied keywords

Cite

Register to see more suggestions