A Systematic Review of Current Trends in Web Content Mining

7Citations
Citations of this article
28Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Knowledge in web documents, Relevance ranking of webpages and so on are some of the under-researched areas in web content mining (WCM). Apart from the general data mining tools used for knowledge discovery in web, there have been few attempts at reviewing WCM and these were from the perspective of the methods used and the problems solved but not in sufficient depth. This existing literature review attempts does not also reveal which problems have been under-researched and which application area has the most attention when it gets to WCM. The goal of this systematic review is to make available a comprehensive and semi-structured overview of WCM methods, problems and solutions proffered. To provide a comprehensive literature review on this subject, 57 publications which include journals, conferences proceeding, and workshops were considered between the periods of 1999-2018. The findings reveal that updating dynamic content, efficient content extraction, eliminating noise blocks etc remain the most prominent challenges associated with WCM with a very high attention on solving these problems in a more efficient manner. Also, most of the solutions proffered to the problems still come with their various limitations which make this area of research fertile for future research. Caching dynamic web data. With regard to content, the techniques used for content extraction in WCM consist of used Data Update Propagation (DUP), Association rule, Object Dependence Graphs, classification techniques, Document Object Model, Vision-Based Segmentation, Hyperlink-Induced Topic Search and so on. Finally, the study revealed that WCM has been mostly applied to general websites which include random webpages seeking to extract specific parameters. The review was able to identify the limitations of the current research on the subject matter and identify future research opportunities in WCM.

Cite

CITATION STYLE

APA

Samuel, M. O., Tolulope, A. I., & Oyejoke, O. O. (2019). A Systematic Review of Current Trends in Web Content Mining. In Journal of Physics: Conference Series (Vol. 1299). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/1299/1/012040

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free