A deep web data extraction model for web mining: A review

Ily Amalina Ahmad Sabri; Mustafa Man

ArticleOPEN ACCESS

A deep web data extraction model for web mining: A review

Indonesian Journal of Electrical Engineering and Computer Science

DOI: 10.11591/ijeecs.v23.i1.pp519-528

3Citations

10Readers

Abstract

The world wide web has become a large pool of information. Extracting structured data from a published webpages has drawn attention in the last decade. The process of web data extraction (WDE) has many challenges, due to variety of web data and the unstructured data from hypertext markup language (HTML) files. The aim of this paper is to provide a comprehensive overview of current web data extraction techniques, in terms of extracted quality data. This paper focuses on study for data extraction using wrapper approaches and compares each other to identify the best approach to extract data from online sites. To observe the efficiency of the proposed model, we compare the performance of data extraction by single web page extraction with different models such as document object model (DOM), wrapper using hybrid dom and json (WHDJ), wrapper extraction of image using DOM and JSON (WEIDJ) and WEIDJ (no-rules). Finally, the experimentations proved that WEIDJ can extract data fastest and low time consuming compared to other proposed method.

Author supplied keywords

Cite

CITATION STYLE

APA

Sabri, I. A. A., & Man, M. (2021, July 1). A deep web data extraction model for web mining: A review. Indonesian Journal of Electrical Engineering and Computer Science. Institute of Advanced Engineering and Science. https://doi.org/10.11591/ijeecs.v23.i1.pp519-528

A deep web data extraction model for web mining: A review

Abstract

Author supplied keywords

Cite

Register to see more suggestions