High-Quality Train Data Generation for Deep Learning-Based Web Page Classification Models

6Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The current deep learning models detecting relevant web pages show low accuracy because of the poor quality of the training data. In this paper, we propose a novel algorithm to automatically generate high-quality training data based on the frequency of the document including the entity of interest. Our experimental results with movies and cellphones data sets show that the average F1-score of the deep learning models (FNN, CNN, Bi-LSTM, and SeqGAN) trained with our proposed algorithm shows up to 0.9992 in F1-score.

Cite

CITATION STYLE

APA

Kim, J. J., On, B. W., & Lee, I. (2021). High-Quality Train Data Generation for Deep Learning-Based Web Page Classification Models. IEEE Access, 9, 85240–85254. https://doi.org/10.1109/ACCESS.2021.3086586

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free