Abstract
The need to detect and classify malware on Android devices has become crucial due to the widespread use of these devices daily. This paper presents a method for detecting and classifying malware that uses static and dynamic features. Specifically, we extract permissions and intent filters from static analysis files and APK API data from dynamic analysis files. Then, we use Simhash to encode the selected parts of the analysis files to create feature vectors. These vectors are then used to train different Machine Learning algorithms for detecting and classifying malware. We conducted several experiments to evaluate the effectiveness of our approach on a dataset of 38,355 labeled Android apps and tested various hash functions and tokenizing methods in the Simhash algorithm. Our results showed that using Random Forest as a classifier, SHA-512 the hash function and 2-WORD tokenization led to high accuracy with reduced time and memory requirements for training, validating, and testing. This indicates a highly accurate model for detecting and classifying malware into 24 families.
Author supplied keywords
Cite
CITATION STYLE
Al-Kahla, W., Taqieddin, E., Shatnawi, A. S., & Al-Ouran, R. (2024). Malware Detection and Classification in Android Application Using Simhash-Based Feature Extraction and Machine Learning. IEEE Access, 12, 174255–174273. https://doi.org/10.1109/ACCESS.2024.3501277
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.