Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning

20Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Binary code similarity detection (BCSD) serves as a basis for a wide spectrum of applications, including software plagiarism, malware classification, and known vulnerability discovery. However, the inference of contextual meanings of a binary is challenging due to the absence of semantic information available in source codes. Recent advances leverage the benefits of a deep learning architecture into a better understanding of underlying code semantics and the advantages of the Siamese architecture into better BCSD. In this paper, we propose BinShot, a BERT-based similarity learning architecture that is highly transferable for effective BCSD. We tackle the problem of detecting code similarity with one-shot learning (a special case of few-shot learning). To this end, we adopt a weighted distance vector with a binary cross entropy as a loss function on top of BERT. With the prototype of BinShot, our experimental results demonstrate the effectiveness, transferability, and practicality of BinShot, which is robust to detecting the similarity of previously unseen functions. We show that BinShot outperforms the previous state-of-the-art approaches for BCSD.

Cite

CITATION STYLE

APA

Ahn, S., Ahn, S., Koo, H., & Paek, Y. (2022). Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning. In ACM International Conference Proceeding Series (pp. 361–374). Association for Computing Machinery. https://doi.org/10.1145/3564625.3567975

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free