DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing

177Citations
Citations of this article
162Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Binary diffing analysis quantitatively measures the differences between two given binaries and produces fine-grained basic block level matching. It has been widely used to enable different kinds of critical security analysis. However, all existing program analysis and machine learning based techniques suffer from low accuracy, poor scalability, coarse granularity, or require extensive labeled training data to function. In this paper, we propose an unsupervised program-wide code representation learning technique to solve the problem. We rely on both the code semantic information and the program-wide control flow information to generate basic block embeddings. Furthermore, we propose a khop greedy matching algorithm to find the optimal diffing results using the generated block embeddings. We implement a prototype called DEEPBINDIFF and evaluate its effectiveness and efficiency with a large number of binaries. The results show that our tool outperforms the state-of-the-art binary diffing tools by a large margin for both cross-version and cross-optimization-level diffing. A case study for OpenSSL using real-world vulnerabilities further demonstrates the usefulness of our system.

Cite

CITATION STYLE

APA

Duan, Y., Li, X., Wang, J., & Yin, H. (2020). DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing. In 27th Annual Network and Distributed System Security Symposium, NDSS 2020. The Internet Society. https://doi.org/10.14722/ndss.2020.24311

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free