Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API

11Citations
Citations of this article
35Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

With the rapid advances of anti-virus and anti-tracking technologies, three aspects in malware clustering need to be improved for effective clustering, i.e., the robustness of features, the accuracy of similarity measurements, and the effectiveness of clustering algorithms. In this paper, we propose a novel malware family clustering approach based on dynamic and static features with their weights. In this approach, we employ a new similarity measurement method based on EMD to improve the accuracy of feature similarities. In addition, to reduce convergence time and improve clustering purity, we design a novel semi-supervised clustering algorithm, termed as S-DBSCAN by involving supervision information into the original algorithm known as Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The experimental results demonstrate that the proposed approach can correctly and accurately distinguish the samples among various families and achieve outperformed purity with 98.7%.

Cite

CITATION STYLE

APA

Fang, Y., Zhang, W., Li, B., Jing, F., & Zhang, L. (2020). Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API. IEEE Access, 8, 2313–2326. https://doi.org/10.1109/ACCESS.2019.2962198

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free