MalFSM: Feature Subset Selection Method for Malware Family Classification

Kong Zixiao; Xue Jingfeng; Wang Yong; Zhang Qian; Han Weijie; Zhu Yufen

Journal ArticleOPEN ACCESS

MalFSM: Feature Subset Selection Method for Malware Family Classification

Chinese Journal of Electronics (2023) 32(1) 26-38

DOI: 10.23919/cje.2022.00.038

10Citations

9Readers

Abstract

Malware detection has been a hot spot in cyberspace security and academic research. We investigate the correlation between the opcode features of malicious samples and perform feature extraction, selection and fusion by filtering redundant features, thus alleviating the dimensional disaster problem and achieving efficient identification of malware families for proper classification. Malware authors use obfuscation technology to generate a large number of malware variants, which imposes a heavy analysis burden on security researchers and consumes a lot of resources in both time and space. To this end, we propose the MalFSM framework. Through the feature selection method, we reduce the 735 opcode features contained in the Kaggle dataset to 16, and then fuse on metadata features (count of file lines and file size) for a total of 18 features, and find that the machine learning classification is efficient and high accuracy. We analyzed the correlation between the opcode features of malicious samples and interpreted the selected features. Our comprehensive experiments show that the highest classification accuracy of MalFSM can reach up to 98.6% and the classification time is only 7.76 s on the Kaggle malware dataset of Microsoft.

Author supplied keywords

Cite

CITATION STYLE

APA

Zixiao, K., Jingfeng, X., Yong, W., Qian, Z., Weijie, H., & Yufen, Z. (2023). MalFSM: Feature Subset Selection Method for Malware Family Classification. Chinese Journal of Electronics, 32(1), 26–38. https://doi.org/10.23919/cje.2022.00.038

MalFSM: Feature Subset Selection Method for Malware Family Classification

Abstract

Author supplied keywords

Cite

Register to see more suggestions