Learn2Evade: Learning-based generative model for evading PDF malware classifiers

9Citations
Citations of this article
39Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Recent research has shown that a small perturbation to an input may forcibly change the prediction of a machine learning (ML) model. Such variants are commonly referred to as adversarial examples. Early studies have focused mostly on ML models for image processing and expanded to other applications, including those for malware classification. In this article, we focus on the problem of finding adversarial examples against ML-based portable document format (PDF) malware classifiers.We deem that our problem is more challenging than those againstMLmodels for image processing because of the highly complex data structure of PDF and of an additional constraint that the generated PDF should exhibit malicious behavior. To resolve our problem, we propose a variant of generative adversarial networks that generate evasive variant PDF malware (without any crash), which can be classified as benign by various existing classifiers yetmaintaining the original malicious behavior. Our model exploits the target classifier as the second discriminator to rapidly generate an evasive variant PDF with our new feature selection process that includes unique features extracted from malicious PDF files. We evaluate our technique against three representative PDF malware classifiers (Hidost 13, Hidost 16, and PDFrate-v2) and further examine its effectiveness with AntiVirus engines from VirusTotal. To the best of our knowledge, our work is the first to analyze the performance against the commercial AntiVirus engines. Our model finds, with great speed, evasive variants for all selected seeds against state-of-The-Art PDF malware classifiers and raises a serious security concern in the presence of adversaries.

Cite

CITATION STYLE

APA

Bae, H., Lee, Y., Kim, Y., Hwang, U., Yoon, S., & Paek, Y. (2021). Learn2Evade: Learning-based generative model for evading PDF malware classifiers. IEEE Transactions on Artificial Intelligence, 2(4), 299–313. https://doi.org/10.1109/TAI.2021.3103139

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free