Are your training datasets yet relevant? An investigation into the importance of timeline in machine learning-based malware detection

47Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we consider the relevance of timeline in the construction of datasets, to highlight its impact on the performance of a machine learning-based malware detection scheme. Typically, we show that simply picking a random set of known malware to train a malware detector, as it is done in many assessment scenarios from the literature, yields significantly biased results. In the process of assessing the extent of this impact through various experiments, we were also able to confirm a number of intuitive assumptions about Android malware. For instance, we discuss the existence of Android malware lineages and how they could impact the performance of malware detection in the wild.

Cite

CITATION STYLE

APA

Allix, K., Bissyandé, T. F., Klein, J., & Traon, Y. L. (2015). Are your training datasets yet relevant? An investigation into the importance of timeline in machine learning-based malware detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8978, pp. 51–67). Springer Verlag. https://doi.org/10.1007/978-3-319-15618-7_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free