Web trace duplication detection based on context

Chang Gao; Xiaoguang Hong; Zhaohui Peng; Hongda Chen

Conference Proceedings

Web trace duplication detection based on context

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 6988 LNCS(PART 2) 292-301

DOI: 10.1007/978-3-642-23982-3_36

2Citations

2Readers

Get full text

Abstract

Data Integration becomes more and more important with the rapidly spread of the internet and the study on entity trace becomes more and more important as a part of it. The entity trace is mainly extracted from the text fragments. There will be much duplication in the records because of the large scale, strong autonomy and the high redundancy features of the web sources. The processing of this problem often carries semantic features, which results in that the traditional integration method cannot be applied on it directly. In this paper, we propose a web trace duplication detection method based on unsupervised learning and context. We address the problem above by a new process on computing the comparison vector between two records based on the context, then acquiring the sample data automatically, training the classifiers with the sample data, and finally classifying the records. © 2011 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Gao, C., Hong, X., Peng, Z., & Chen, H. (2011). Web trace duplication detection based on context. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6988 LNCS, pp. 292–301). https://doi.org/10.1007/978-3-642-23982-3_36

Web trace duplication detection based on context

Abstract

Author supplied keywords

Cite

Register to see more suggestions