DIRE and its Data: Neural Decompiled Variable Renamings with Respect to Software Class

1Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

The decompiler is one of the most common tools for examining executable binaries without the corresponding source code. It transforms binaries into high-level code, reversing the compilation process. Unfortunately, decompiler output is far from readable because the decompilation process is often incomplete. State-of-the-art techniques use machine learning to predict missing information like variable names. While these approaches are often able to suggest good variable names in context, no existing work examines how the selection of training data influences these machine learning models. We investigate how data provenance and the quality of training data affect performance, and how well, if at all, trained models generalize across software domains. We focus on the variable renaming problem using one such machine learning model, DIRE. We first describe DIRE in detail and the accompanying technique used to generate training data from raw code. We also evaluate DIRE's overall performance without respect to data quality. Next, we show how training on more popular, possibly higher quality code (measured using GitHub stars) leads to a more generalizable model because popular code tends to have more diverse variable names. Finally, we evaluate how well DIRE predicts domain-specific identifiers, propose a modification to incorporate domain information, and show that it can predict identifiers in domain-specific scenarios 23% more frequently than the original DIRE model.

Cite

CITATION STYLE

APA

Dramko, L., Lacomis, J., Yin, P., Schwartz, E., Allamanis, M., Neubig, G., … Le Goues, C. (2023). DIRE and its Data: Neural Decompiled Variable Renamings with Respect to Software Class. ACM Transactions on Software Engineering and Methodology, 32(2). https://doi.org/10.1145/3546946

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free