Learning latent byte-level feature representation for malware detection

6Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper proposes two different byte level feature representations of binary files for malware detection. The proposed static feature representations do not need any third-party tools and are independent of the operating system because they operate on the raw file bytes. Sparse term-frequency simhashing (s-tf-simhashing) is a faster type of tf-simhashing. S-tf-simhashing requires less computation and outperforms the original dense tf-simhashing. The binary word2vec (Bword2vec) representation embeds the semantic relationships of the n-grams into the code vectors. Bword2vec employs a binary to word2vec representation that reduces the feature space dimension than s-tf-simhashing and thus further reducing the computation of the classifier. We show that the proposed techniques can successfully be used for both analyzing of full malware apps and infected files. The experiments are conducted on real Android and PDF malware datasets.

Cite

CITATION STYLE

APA

Yousefi-Azar, M., Hamey, L., Varadharajan, V., & Chen, S. (2018). Learning latent byte-level feature representation for malware detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11304 LNCS, pp. 568–578). Springer Verlag. https://doi.org/10.1007/978-3-030-04212-7_50

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free