Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware Clustering

6Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

With the development of IoT devices, there is a rapid increase in new types of IoT malware and variants, causing social problems. The malware's phylogenetic tree has been used in many studies for malware clustering or better understanding of malware evolution. However, when dealing with a large-scale malware set, conventional methods for constructing a phylogenetic tree is very time-consuming or even cannot be done in a realistic time. To solve this problem, we propose a high-speed, scalable phylogenetic tree construction algorithm with a clustering algorithm to cluster it. The proposed method involves the following steps: (1) Calculating the similarity of the specimen pairs using the normalized compression distance. (2) Creating a phylogenetic tree containing all specimens, instead of calculating the similarity of all pairs of a specimen, our algorithm only calculates a small part of the similarity matrix. (3) Dividing the phylogenetic tree into clusters by applying the minimum description length criterion. In addition, we propose a new online processing algorithm to add new malware specimens into the existing phylogenetic tree sequentially. Our goal is to reduce the computational cost of constructing the phylogenetic tree and improve the clustering accuracy of our previous research. We evaluated our method's clustering accuracy and scalability with 65,494 IoT malware specimens. The results showed that our algorithm reduced the computation by 97.52% compared with the conventional method. Our clustering algorithm achieved accuracies of 95.5% and 99.3% for clustering family name and architecture name, respectively.

Cite

CITATION STYLE

APA

He, T., Han, C., Isawa, R., Takahashi, T., Kijima, S., & Takeuchi, J. (2023). Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware Clustering. IEEE Access, 11, 8240–8253. https://doi.org/10.1109/ACCESS.2023.3238711

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free