Discovering similarities in malware behaviors by clustering of API call sequences

6Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

New genres of malware are evading detection by using polymorphism, obfuscation and encryption techniques. Hence, new strategies are needed to overcome the limitations of current malware analysis practices. In this paper, we propose an unsupervised learning (clustering) framework to complement the supervised learning (i.e., classifier-based malware detection) approach. We cluster malware instances to discover similarities in their dynamic behaviors and to detect new malware families. For that, we utilize Application Programming Interface (API) call sequences to represent the behaviors of malware in dynamic runtime environment. We investigate three sequence comparison algorithms, namely, Optimal Matching (OM), Longest Common Subsequence (LCS), and Longest Common Prefix (LCP) for calculating sequence–sequence distances to be used for hierarchical clustering. Among the three algorithms, LCP is found to be both the most effective in terms of clustering quality and the most efficient in terms of time complexity (linear-time).

Cite

CITATION STYLE

APA

Al Shamsi, F., Woon, W. L., & Aung, Z. (2018). Discovering similarities in malware behaviors by clustering of API call sequences. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11304 LNCS, pp. 122–133). Springer Verlag. https://doi.org/10.1007/978-3-030-04212-7_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free