Malware Clustering Based on Called API During Runtime

5Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Malware growth was exponential in the last years, therefore it is a tedious work to manually analyze them in order to observe when a new strain appears. In this article we present a dynamic analysis system which clusters suspicious executable files in different malware families, based on the behavioral similarities their running processes exhibit thus reducing the workload of malware analysts. We identified similarities between our approach and the problem of text clustering based on topic, achieving similar results to text clustering without semantic analysis involved. We modeled the behavior of a process by extracting sequences of Windows API functions called by that process during its execution. We separated the registered API calls on three levels, based on their impact on the system, and dealt with them as text-like terms. More complex terms were constructed with N-grams and the features were represented with TF-IDF scores. We clustered the processes with variants of the k-means algorithm and derived a method for analyzing cluster characteristics in order to determine the best number of clusters to be considered. Finally, we identified the API level and N-gram lengths required to obtain relevant clusters.

Cite

CITATION STYLE

APA

Széles, G. J., & Coleşa, A. (2019). Malware Clustering Based on Called API During Runtime. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11398 LNCS, pp. 110–121). Springer Verlag. https://doi.org/10.1007/978-3-030-12085-6_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free