Abstract
When presented with an unknown binary, which may or may not be complete, having the ability to determine information about it is critical to future reverse engineering, particularly in discovering the binary's intended use and potential malicious nature. This paper details techniques to both identify the machine architecture of the binary, as well as to locate the important code segments within the file. This identification of unknown binaries makes use of a technique called byte histogram in addition to various machine learning (ML) techniques, which we call 'What is it Binary' or WiiBin. Benefits of byte histograms reflect the simplicity of calculation and do not rely on file headers or metadata, allowing for acceptable results when only a small portion of the original file is available; e.g., when encrypted and/or compressed sections are present in a binary. Utilizing WiiBin, we were able to accurately (>80%) determine the architecture of test binaries with as little as a 20% contagious portion of the file present. We were also able to determine the location of code sections within a binary by utilizing the WiiBin framework. Ultimately, the more information that can be gleaned from a binary file, the easier it is to successfully reverse engineer.
Author supplied keywords
Cite
CITATION STYLE
Beckman, B., & Haile, J. (2020). Binary analysis with architecture and code section detection using supervised machine learning. In Proceedings - 2020 IEEE Symposium on Security and Privacy Workshops, SPW 2020 (pp. 152–156). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/SPW50608.2020.00041
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.