In today’s world, methods for real-time web page classification are in need due to the tremendous increase in the number of web pages and Internet usage of the people. To address these problems, in the literature, URL-based methods have been proposed which have advantages in classification speed and computational effectiveness over content-based approaches. This work proposes a CNN-based method using URLs only as input. We extract word-level tokens from the URLs alone, feed them into a word embedding layer and then hyper-tunned CNN layers. Our experiments demonstrate that this method can archive an F1-score of 0.9759 and outperforms many existing methods for a new large dataset.
CITATION STYLE
Hung, P. D., Hung, N. D., & Diep, V. T. (2022). URL Classification Using Convolutional Neural Network for a New Large Dataset. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13492 LNCS, pp. 103–114). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-16538-2_11
Mendeley helps you to discover research relevant for your work.