Extracting compiler provenance from program binaries

  • Rosenblum N
  • Miller B
  • Zhu X
  • 35


    Mendeley users who have this article in their library.
  • 16


    Citations of this article.


We present a novel technique that identifies the source compiler of program binaries, an important element of program provenance. Program provenance answers fundamental questions of malware analysis and software forensics, such as whether programs are gen- erated by similar tool chains; it also can allow development of de- bugging, performance analysis, and instrumentation tools specific to particular compilers. We formulate compiler identification as a structured learning problem, automatically building models to rec- ognize sequences of binary code generated by particular compilers. We evaluate our techniques on a large set of real-world test bina- ries, showing that our models identify the source compiler of binary code with over 90% accuracy, even in the presence of interleaved code from multiple compilers.Acase study demonstrates the use of inferred compiler provenance to augment stripped binary parsing, reducing parsing errors by 18%.

Author-supplied keywords

  • forensics
  • program provenance
  • static binary analysis

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document


  • Nathan Rosenblum

  • Barton Miller

  • Xiaojin Zhu

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free