Malware Detection and Classification in Android Application Using Simhash-Based Feature Extraction and Machine Learning

Wafaa Al-Kahla; Eyad Taqieddin; Ahmed S. Shatnawi; Rami Al-Ouran

Journal ArticleOPEN ACCESS

Malware Detection and Classification in Android Application Using Simhash-Based Feature Extraction and Machine Learning

IEEE Access (2024) 12 174255-174273

DOI: 10.1109/ACCESS.2024.3501277

3Citations

50Readers

Abstract

The need to detect and classify malware on Android devices has become crucial due to the widespread use of these devices daily. This paper presents a method for detecting and classifying malware that uses static and dynamic features. Specifically, we extract permissions and intent filters from static analysis files and APK API data from dynamic analysis files. Then, we use Simhash to encode the selected parts of the analysis files to create feature vectors. These vectors are then used to train different Machine Learning algorithms for detecting and classifying malware. We conducted several experiments to evaluate the effectiveness of our approach on a dataset of 38,355 labeled Android apps and tested various hash functions and tokenizing methods in the Simhash algorithm. Our results showed that using Random Forest as a classifier, SHA-512 the hash function and 2-WORD tokenization led to high accuracy with reduced time and memory requirements for training, validating, and testing. This indicates a highly accurate model for detecting and classifying malware into 24 families.

Author supplied keywords

Cite

CITATION STYLE

APA

Al-Kahla, W., Taqieddin, E., Shatnawi, A. S., & Al-Ouran, R. (2024). Malware Detection and Classification in Android Application Using Simhash-Based Feature Extraction and Machine Learning. IEEE Access, 12, 174255–174273. https://doi.org/10.1109/ACCESS.2024.3501277

Malware Detection and Classification in Android Application Using Simhash-Based Feature Extraction and Machine Learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions