Data extraction from web data sources

Jerome Robinson

Conference Proceedings

Data extraction from web data sources

Robinson J

International Conference on Database and Expert Systems Applications - DEXA (2004) 15 282-288

DOI: 10.1109/dexa.2004.1333487

4Citations

6Readers

Get full text

Abstract

An explanation is given of the basic data structures used in a new page analysis technique to create wrappers (data extractors) for the result pages produced by web sites in response to user qeries via web page forms. The key structure called a tpGrid is a representation of the web page, which is easier to analyse than the raw html code. The analysis looks for repetition patterns of sets of tagSets, which are defined in the paper.

Cite

CITATION STYLE

APA

Robinson, J. (2004). Data extraction from web data sources. In International Conference on Database and Expert Systems Applications - DEXA (Vol. 15, pp. 282–288). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/dexa.2004.1333487

Data extraction from web data sources

Abstract

Cite

Register to see more suggestions