Abstract
The current deep learning models detecting relevant web pages show low accuracy because of the poor quality of the training data. In this paper, we propose a novel algorithm to automatically generate high-quality training data based on the frequency of the document including the entity of interest. Our experimental results with movies and cellphones data sets show that the average F1-score of the deep learning models (FNN, CNN, Bi-LSTM, and SeqGAN) trained with our proposed algorithm shows up to 0.9992 in F1-score.
Author supplied keywords
Cite
CITATION STYLE
Kim, J. J., On, B. W., & Lee, I. (2021). High-Quality Train Data Generation for Deep Learning-Based Web Page Classification Models. IEEE Access, 9, 85240–85254. https://doi.org/10.1109/ACCESS.2021.3086586
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.