Learn2Evade: Learning-based generative model for evading PDF malware classifiers

Ho Bae; Younghan Lee; Yohan Kim; Uiwon Hwang; Sungroh Yoon; Yunheung Paek

Journal ArticleOPEN ACCESS

Learn2Evade: Learning-based generative model for evading PDF malware classifiers

IEEE Transactions on Artificial Intelligence (2021) 2(4) 299-313

DOI: 10.1109/TAI.2021.3103139

9Citations

39Readers

Abstract

Recent research has shown that a small perturbation to an input may forcibly change the prediction of a machine learning (ML) model. Such variants are commonly referred to as adversarial examples. Early studies have focused mostly on ML models for image processing and expanded to other applications, including those for malware classification. In this article, we focus on the problem of finding adversarial examples against ML-based portable document format (PDF) malware classifiers.We deem that our problem is more challenging than those againstMLmodels for image processing because of the highly complex data structure of PDF and of an additional constraint that the generated PDF should exhibit malicious behavior. To resolve our problem, we propose a variant of generative adversarial networks that generate evasive variant PDF malware (without any crash), which can be classified as benign by various existing classifiers yetmaintaining the original malicious behavior. Our model exploits the target classifier as the second discriminator to rapidly generate an evasive variant PDF with our new feature selection process that includes unique features extracted from malicious PDF files. We evaluate our technique against three representative PDF malware classifiers (Hidost 13, Hidost 16, and PDFrate-v2) and further examine its effectiveness with AntiVirus engines from VirusTotal. To the best of our knowledge, our work is the first to analyze the performance against the commercial AntiVirus engines. Our model finds, with great speed, evasive variants for all selected seeds against state-of-The-Art PDF malware classifiers and raises a serious security concern in the presence of adversaries.

Author supplied keywords

Cite

CITATION STYLE

APA

Bae, H., Lee, Y., Kim, Y., Hwang, U., Yoon, S., & Paek, Y. (2021). Learn2Evade: Learning-based generative model for evading PDF malware classifiers. IEEE Transactions on Artificial Intelligence, 2(4), 299–313. https://doi.org/10.1109/TAI.2021.3103139

Learn2Evade: Learning-based generative model for evading PDF malware classifiers

Abstract

Author supplied keywords

Cite

Register to see more suggestions