Binshape: Scalable and robust binary library function identification using function shape

27Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Identifying library functions in program binaries is important to many security applications, such as threat analysis, digital forensics, software infringement, and malware detection. Today’s program binaries normally contain a significant amount of third-party library functions taken from standard libraries or free open-source software packages. The ability to automatically identify such library functions not only enhances the quality and the efficiency of threat analysis and reverse engineering tasks, but also improves their accuracy by avoiding false correlations between irrelevant code bases. Existing methods are found to either lack efficiency or are not robust enough to identify different versions of the same library function caused by the use of different compilers, different compilation settings, or obfuscation techniques. To address these limitations, we present a scalable and robust system called Bin Shape to identify standard library functions in binaries. The key idea of BinShape is twofold. First, we derive a robust signature for each library function based on heterogeneous features covering CFGs, instruction-level characteristics, statistical characteristics, and function-call graphs. Second, we design a novel data structure to store such signatures and facilitate efficient matching against a target function. We evaluate BinShape on a diverse set of C/C++ binaries, compiled with GCC and Visual Studio compilers on x86-x64 CPU architectures, at optimization levels O0−O3. Our experiments show that BinShape is able to identify library functions in real binaries both efficiently and accurately, with an average accuracy of 89% and taking about 0.14 s to identify one function out of three million candidates. We also show that BinShape is robust enough when the code is subjected to different compilers, slight modification, or some obfuscation techniques.

Cite

CITATION STYLE

APA

Shirani, P., Wang, L., & Debbabi, M. (2017). Binshape: Scalable and robust binary library function identification using function shape. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10327 LNCS, pp. 301–324). Springer Verlag. https://doi.org/10.1007/978-3-319-60876-1_14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free