An Empirical Study on the Effects of Obfuscation on Static Machine Learning-Based Malicious JavaScript Detectors

Kunlun Ren; Weizhong Qiang; Yueming Wu; Yi Zhou; Deqing Zou; Hai Jin

Conference ProceedingsOPEN ACCESS

An Empirical Study on the Effects of Obfuscation on Static Machine Learning-Based Malicious JavaScript Detectors

ISSTA 2023 - Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (2023) 1420-1432

DOI: 10.1145/3597926.3598146

13Citations

16Readers

Get full text

Abstract

Machine learning is increasingly being applied to malicious JavaScript detection in response to the growing number of Web attacks and the attendant costly manual identification. In practice, to hide their malicious behaviors or protect intellectual copyrights, both malicious and benign scripts tend to obfuscate their own code before uploading. While obfuscation is beneficial, it also introduces some additional code features (e.g., dead code) into the code. When machine learning is employed to learn a malicious JavaScript detector, these additional features can affect the model to make it less effective. However, there is still a lack of clear understanding of how robust existing machine learning-based detectors are on different obfuscators. In this paper, we conduct the first empirical study to figure out how obfuscation affects machine learning detectors based on static features. Through the results, we observe several findings: 1) Obfuscation has a significant impact on the effectiveness of detectors, causing an increase both in false negative rate (FNR) and false positive rate (FPR), and the bias of obfuscation in the training set induces detectors to detect obfuscation rather than malicious behaviors. 2) The common measures such as improving the quality of the training set by adding relevant obfuscated samples and leveraging state-of-the-art deep learning models can not work well.3) The root cause of obfuscation effects on these detectors is that feature spaces they use can only reflect shallow differences in code, not about the nature of benign and malicious, which can be easily affected by the differences brought by obfuscation. 4) Obfuscation has a similar effect on realistic detectors in VirusTotal, indicating that this is a common real-world problem.

Author supplied keywords

Cite

CITATION STYLE

APA

Ren, K., Qiang, W., Wu, Y., Zhou, Y., Zou, D., & Jin, H. (2023). An Empirical Study on the Effects of Obfuscation on Static Machine Learning-Based Malicious JavaScript Detectors. In ISSTA 2023 - Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (pp. 1420–1432). Association for Computing Machinery, Inc. https://doi.org/10.1145/3597926.3598146

An Empirical Study on the Effects of Obfuscation on Static Machine Learning-Based Malicious JavaScript Detectors

Abstract

Author supplied keywords

Cite

Register to see more suggestions