Why Current Statistical Approaches to Ransomware Detection Fail

Jamie Pont; Budi Arief; Julio Hernandez-Castro

Conference Proceedings

Why Current Statistical Approaches to Ransomware Detection Fail

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12472 LNCS 199-216

DOI: 10.1007/978-3-030-62974-8_12

12Citations

18Readers

Get full text

Abstract

The frequent use of basic statistical techniques to detect ransomware is a popular and intuitive strategy; statistical tests can be used to identify randomness, which in turn can indicate the presence of encryption and, by extension, a ransomware attack. However, common file formats such as images and compressed data can look random from the perspective of some of these tests. In this work, we investigate the current frequent use of statistical tests in the context of ransomware detection, primarily focusing on false positive rates. The main aim of our work is to show that the current over-dependence on simple statistical tests within anti-ransomware tools can cause serious issues with the reliability and consistency of ransomware detection in the form of frequent false classifications. We determined thresholds for five key statistics frequently used in detecting randomness, namely Shannon entropy, chi-square, arithmetic mean, Monte Carlo estimation for Pi and serial correlation coefficient. We obtained a large dataset of 84,327 files comprising of images, compressed data and encrypted data. We then tested these thresholds (taken from a variety of previous publications in the literature where possible) against our dataset, showing that the rate of false positives is far beyond what could be considered acceptable. False positive rates were often above 50% and even above 90% on several occasions. False negative rates were also generally between 5% and 20%, numbers which are also far too high. As a direct result of these experiments, we determine that relying on these simple statistical approaches is not good enough to detect ransomware attacks consistently. We instead recommend the exploration of higher-order statistics such as skewness and kurtosis for future ransomware detection techniques.

Author supplied keywords

Cite

CITATION STYLE

APA

Pont, J., Arief, B., & Hernandez-Castro, J. (2020). Why Current Statistical Approaches to Ransomware Detection Fail. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12472 LNCS, pp. 199–216). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-62974-8_12

Why Current Statistical Approaches to Ransomware Detection Fail

Abstract

Author supplied keywords

Cite

Register to see more suggestions