Malware Detection Using LightGBM with a Custom Logistic Loss Function

Yun Gao; Hirokazu Hasegawa; Yukiko Yamaguchi; Hajime Shimada

Journal ArticleOPEN ACCESS

Malware Detection Using LightGBM with a Custom Logistic Loss Function

IEEE Access (2022) 10 47792-47804

DOI: 10.1109/ACCESS.2022.3171912

24Citations

45Readers

Abstract

The increased spread of malicious software (malware) through the internet remains a serious threat. Malware authors use obfuscation and deformation techniques to generate new types than can evade traditional detection methods. Hence, it is widely expected that machine learning methods can classify malware and cleanware based on the characteristics of malware samples. This paper investigates malware classification accuracy using static methods for malware detection based on LightGBM by a custom log loss function, which controls learning by installing coefficient α to a loss function of the false-negative side and coefficient β to a loss function of the false-positive side. By installing coefficients, we can create a lopsided classifier. We used two malware datasets, non-public and public, to construct a malware baseline model to verify the effectiveness of the proposed method. We extracted the dataset features from PE-file surface analysis and PE-header dumps and customized a binary log loss function to improve all the classification evaluation metrics to a certain extent. We obtained a better result (AUC = 0.979) at α =430 and β =339 than the normal log loss function (AUC = 0.978) on the EMBER dataset. In addition, to maintain malware detection coverage and quick countermeasures to true positive results, we propose a hybrid usage of different custom models to prioritize positive results.

Author supplied keywords

Cite

CITATION STYLE

APA

Gao, Y., Hasegawa, H., Yamaguchi, Y., & Shimada, H. (2022). Malware Detection Using LightGBM with a Custom Logistic Loss Function. IEEE Access, 10, 47792–47804. https://doi.org/10.1109/ACCESS.2022.3171912

Malware Detection Using LightGBM with a Custom Logistic Loss Function

Abstract

Author supplied keywords

Cite

Register to see more suggestions