From CIC-IDS2017 to LYCOS-IDS2017: A corrected dataset for better performance

Arnaud Rosay; Florent Carlier; Eloïse Cheval; Pascal Leroux

Conference ProceedingsOPEN ACCESS

From CIC-IDS2017 to LYCOS-IDS2017: A corrected dataset for better performance

ACM International Conference Proceeding Series (2021) 570-575

DOI: 10.1145/3486622.3493973

6Citations

7Readers

Get full text

Abstract

As connected objects become the standard for quality of life, network intrusion detection is getting more critical than ever. Over the past decades, various datasets have been developed to address this security challenge. Analysis of earlier datasets, such as KDD-Cup99 and NSL-KDD, highlighted some of the issues, leading the way for newer datasets that have corrected the identified problems. CIC-IDS2017, one of the newest network intrusion detection datasets, has become a popular choice. Its advantage is the availability of raw data in PCAP files as well as flow-based features in CSV files. In this paper, a detailed analysis of this dataset is performed and we report several problems discovered in the flows retrieved from the network packets. To overcome these problems, a new feature extraction tool named LycoSTand is suggested. In addition, a feature selection is proposed considering correlations and feature importance. The performance comparison between the original and the new dataset shows significant improvements for all evaluated machine learning algorithms. Based on the improvements in CIC-IDS2017, we also examine other datasets affected by the same issues on which LycoSTand can be used to produce improved datasets for network intrusion detection.

Author supplied keywords

Cite

CITATION STYLE

APA

Rosay, A., Carlier, F., Cheval, E., & Leroux, P. (2021). From CIC-IDS2017 to LYCOS-IDS2017: A corrected dataset for better performance. In ACM International Conference Proceeding Series (pp. 570–575). Association for Computing Machinery. https://doi.org/10.1145/3486622.3493973

From CIC-IDS2017 to LYCOS-IDS2017: A corrected dataset for better performance

Abstract

Author supplied keywords

Cite

Register to see more suggestions