Malware Detection and Classification in Android Application Using Simhash-Based Feature Extraction and Machine Learning

3Citations
Citations of this article
50Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The need to detect and classify malware on Android devices has become crucial due to the widespread use of these devices daily. This paper presents a method for detecting and classifying malware that uses static and dynamic features. Specifically, we extract permissions and intent filters from static analysis files and APK API data from dynamic analysis files. Then, we use Simhash to encode the selected parts of the analysis files to create feature vectors. These vectors are then used to train different Machine Learning algorithms for detecting and classifying malware. We conducted several experiments to evaluate the effectiveness of our approach on a dataset of 38,355 labeled Android apps and tested various hash functions and tokenizing methods in the Simhash algorithm. Our results showed that using Random Forest as a classifier, SHA-512 the hash function and 2-WORD tokenization led to high accuracy with reduced time and memory requirements for training, validating, and testing. This indicates a highly accurate model for detecting and classifying malware into 24 families.

Cite

CITATION STYLE

APA

Al-Kahla, W., Taqieddin, E., Shatnawi, A. S., & Al-Ouran, R. (2024). Malware Detection and Classification in Android Application Using Simhash-Based Feature Extraction and Machine Learning. IEEE Access, 12, 174255–174273. https://doi.org/10.1109/ACCESS.2024.3501277

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free