Malware Clustering Based on Called API During Runtime

Gergő János Széles; Adrian Coleşa

Conference ProceedingsOPEN ACCESS

Malware Clustering Based on Called API During Runtime

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11398 LNCS 110-121

DOI: 10.1007/978-3-030-12085-6_10

5Citations

9Readers

Get full text

Abstract

Malware growth was exponential in the last years, therefore it is a tedious work to manually analyze them in order to observe when a new strain appears. In this article we present a dynamic analysis system which clusters suspicious executable files in different malware families, based on the behavioral similarities their running processes exhibit thus reducing the workload of malware analysts. We identified similarities between our approach and the problem of text clustering based on topic, achieving similar results to text clustering without semantic analysis involved. We modeled the behavior of a process by extracting sequences of Windows API functions called by that process during its execution. We separated the registered API calls on three levels, based on their impact on the system, and dealt with them as text-like terms. More complex terms were constructed with N-grams and the features were represented with TF-IDF scores. We clustered the processes with variants of the k-means algorithm and derived a method for analyzing cluster characteristics in order to determine the best number of clusters to be considered. Finally, we identified the API level and N-gram lengths required to obtain relevant clusters.

Author supplied keywords

Cite

CITATION STYLE

APA

Széles, G. J., & Coleşa, A. (2019). Malware Clustering Based on Called API During Runtime. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11398 LNCS, pp. 110–121). Springer Verlag. https://doi.org/10.1007/978-3-030-12085-6_10

Malware Clustering Based on Called API During Runtime

Abstract

Author supplied keywords

Cite

Register to see more suggestions