As connected objects become the standard for quality of life, network intrusion detection is getting more critical than ever. Over the past decades, various datasets have been developed to address this security challenge. Analysis of earlier datasets, such as KDD-Cup99 and NSL-KDD, highlighted some of the issues, leading the way for newer datasets that have corrected the identified problems. CIC-IDS2017, one of the newest network intrusion detection datasets, has become a popular choice. Its advantage is the availability of raw data in PCAP files as well as flow-based features in CSV files. In this paper, a detailed analysis of this dataset is performed and we report several problems discovered in the flows retrieved from the network packets. To overcome these problems, a new feature extraction tool named LycoSTand is suggested. In addition, a feature selection is proposed considering correlations and feature importance. The performance comparison between the original and the new dataset shows significant improvements for all evaluated machine learning algorithms. Based on the improvements in CIC-IDS2017, we also examine other datasets affected by the same issues on which LycoSTand can be used to produce improved datasets for network intrusion detection.
CITATION STYLE
Rosay, A., Carlier, F., Cheval, E., & Leroux, P. (2021). From CIC-IDS2017 to LYCOS-IDS2017: A corrected dataset for better performance. In ACM International Conference Proceeding Series (pp. 570–575). Association for Computing Machinery. https://doi.org/10.1145/3486622.3493973
Mendeley helps you to discover research relevant for your work.