Binary code similarity detection (BCSD) serves as a basis for a wide spectrum of applications, including software plagiarism, malware classification, and known vulnerability discovery. However, the inference of contextual meanings of a binary is challenging due to the absence of semantic information available in source codes. Recent advances leverage the benefits of a deep learning architecture into a better understanding of underlying code semantics and the advantages of the Siamese architecture into better BCSD. In this paper, we propose BinShot, a BERT-based similarity learning architecture that is highly transferable for effective BCSD. We tackle the problem of detecting code similarity with one-shot learning (a special case of few-shot learning). To this end, we adopt a weighted distance vector with a binary cross entropy as a loss function on top of BERT. With the prototype of BinShot, our experimental results demonstrate the effectiveness, transferability, and practicality of BinShot, which is robust to detecting the similarity of previously unseen functions. We show that BinShot outperforms the previous state-of-the-art approaches for BCSD.
CITATION STYLE
Ahn, S., Ahn, S., Koo, H., & Paek, Y. (2022). Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning. In ACM International Conference Proceeding Series (pp. 361–374). Association for Computing Machinery. https://doi.org/10.1145/3564625.3567975
Mendeley helps you to discover research relevant for your work.