Understanding Update of Machine-Learning-Based Malware Detection by Clustering Changes in Feature Attributions

2Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Machine learning (ML) models are often adopted in malware detection systems. To ensure the detection performance in such ML-based systems, updating ML models with new data is crucial for minimizing the influence of data variation over time. After an update, validating the new model is commonly done using the detection accuracy as a metric. However, the accuracy does not include detailed information, such as changes in the features used for prediction. Such information is beneficial for avoiding unexpected updates, such as overfitting or noneffective updates. We, therefore, propose a method for understanding ML model updates in malware detection systems by using a feature attribution method called Shapley additive explanations (SHAP), which interprets the output of an ML model by assigning an importance value called a SHAP value to each feature. In our method, we identify patterns of feature attribution changes that cause a change in the prediction. In this method, we first obtain the feature attributions for each sample, which change before and after the update. Then, we obtain the patterns of the changes in the feature attributions that are common for multiple samples by clustering the changes in the feature attributions. In this study, we conduct experiments using an open dataset of Android malware and demonstrate that our method can identify the causes of performance changes, such as overfitting or noneffective updates.

Cite

CITATION STYLE

APA

Fan, Y., Shibahara, T., Ohsita, Y., Chiba, D., Akiyama, M., & Murata, M. (2021). Understanding Update of Machine-Learning-Based Malware Detection by Clustering Changes in Feature Attributions. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12835 LNCS, pp. 99–118). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-85987-9_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free