MalFSM: Feature Subset Selection Method for Malware Family Classification

10Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Malware detection has been a hot spot in cyberspace security and academic research. We investigate the correlation between the opcode features of malicious samples and perform feature extraction, selection and fusion by filtering redundant features, thus alleviating the dimensional disaster problem and achieving efficient identification of malware families for proper classification. Malware authors use obfuscation technology to generate a large number of malware variants, which imposes a heavy analysis burden on security researchers and consumes a lot of resources in both time and space. To this end, we propose the MalFSM framework. Through the feature selection method, we reduce the 735 opcode features contained in the Kaggle dataset to 16, and then fuse on metadata features (count of file lines and file size) for a total of 18 features, and find that the machine learning classification is efficient and high accuracy. We analyzed the correlation between the opcode features of malicious samples and interpreted the selected features. Our comprehensive experiments show that the highest classification accuracy of MalFSM can reach up to 98.6% and the classification time is only 7.76 s on the Kaggle malware dataset of Microsoft.

Cite

CITATION STYLE

APA

Zixiao, K., Jingfeng, X., Yong, W., Qian, Z., Weijie, H., & Yufen, Z. (2023). MalFSM: Feature Subset Selection Method for Malware Family Classification. Chinese Journal of Electronics, 32(1), 26–38. https://doi.org/10.23919/cje.2022.00.038

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free