Data extraction for search engine using safe matching

Jer Lang Hong; Ee Xion Tan; Fariza Fauzi

Conference Proceedings

Data extraction for search engine using safe matching

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 7106 LNAI 759-768

DOI: 10.1007/978-3-642-25832-9_77

1Citations

2Readers

Get full text

Abstract

Our study shows that algorithms used to check the similarity of data records affect the efficiency of a wrapper. A closer examination indicates that the accuracy of a wrapper can be improved if the DOM Tree and visual properties of data records can be fully utilized. In this paper, we develop algorithms to check the similarity of data records based on the distinct tags and visual cue of the tree structure of data records and the voting algorithm which can detect the similarity of data records of a relevant data region which may contain irrelevant information such as search identifiers to distinguish the potential data regions more correctly and eliminate data region only when necessary. Experimental results show that our wrapper performs better than state of the art wrapper WISH and it is highly effective in data extraction. This wrapper will be useful for meta search engine application, which needs an accurate tool to locate its source of information. © 2011 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Hong, J. L., Tan, E. X., & Fauzi, F. (2011). Data extraction for search engine using safe matching. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7106 LNAI, pp. 759–768). Springer Verlag. https://doi.org/10.1007/978-3-642-25832-9_77

Data extraction for search engine using safe matching

Abstract

Author supplied keywords

Cite

Register to see more suggestions