Abstract
Machine learning (ML) has been a staple of academic research into pattern recognition in many fields, including cybersecurity. The momentum of ML continues to speed up alongside the advances in hardware capabilities and the methods they unlock, primarily (deep) neural networks. However, this article aims to demonstrate that the non-judicious use of ML in two prominent domains of data-based cybersecurity consistently misleads researchers into believing that their proposed methods constitute actual improvements. Armed with 17 state-of-the-art datasets in traffic and malware classification and the simplest possible machine learning model this article will show that the lack of variability in most of these datasets immediately leads to excellent models, even if that model is only one comparison per feature.
Author supplied keywords
Cite
CITATION STYLE
D’hooge, L., Verkerken, M., Wauters, T., De Turck, F., & Volckaert, B. (2023). Castles Built on Sand: Observations from Classifying Academic Cybersecurity Datasets with Minimalist Methods. In International Conference on Internet of Things, Big Data and Security, IoTBDS - Proceedings (Vol. 2023-April, pp. 61–72). Science and Technology Publications, Lda. https://doi.org/10.5220/0011853300003482
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.