A fast algorithm for constructing phylogenetic trees with application to IoT malware clustering

7Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

For efficiently handling thousands of malware specimens, we aim to quickly and automatically categorize those into malware families. A solution for this could be the neighbor-joining method using NCD (Normalized Compression Distance) as similarity of malware. It creates a phylogenetic tree of malware based on the NCDs between malware binaries for clustering. However, it is frustratingly slow because it requires (N2+N)/2 compression attempts for the NCDs, where N is the number of given specimens. For fast clustering, this paper presents an algorithm for efficiently constructing a phylogenetic tree by greatly reducing compression attempts. The key idea to do so is not to construct a tree of N specimens all at once. Instead, it divides N specimens into temporal clusters in advance, constructs a small tree for each temporal cluster, and joins the trees as a united tree. Intuitively, separately constructing small trees requires a much smaller number of compression attempts than (N2+N)/2. With experiments using 4,109 in-the-wild malware specimens, we confirm that our algorithm achieved clustering 22 times faster than the neighbor-joining method with a good accuracy of 97%.

Cite

CITATION STYLE

APA

He, T., Han, C., Isawa, R., Takahashi, T., Kijima, S., Takeuchi, J., & Nakao, K. (2019). A fast algorithm for constructing phylogenetic trees with application to IoT malware clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11953 LNCS, pp. 766–778). Springer. https://doi.org/10.1007/978-3-030-36708-4_63

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free