Abstract
Bytecode is used in software analysis and other approaches due to its advantages such as high availability and simple specification. Therefore, to leverage these advantages in training language models with bytecode, it is important to clearly recognize the characteristics of the naturalness of bytecode. However, the naturalness of bytecode has not been actively explored. In this paper, we experimentally show the naturalness of bytecode instructions and investigate their characteristics by empirically assessing 10 Java open-source projects. Consequently, we demonstrate that the bytecode instructions are more natural than source code representations and less natural than abstract syntax tree representations at a method-level. Furthermore, we found that there is no correlation between the naturalness of bytecode instructions and source code representations at a method-level. Our study supports that researchers need to deal with the characteristics of the naturalness of bytecode instructions in a different view from source code. We expect that these findings will be helpful for future work to study automated software engineering tasks such as automated debugging and vulnerability detection that use bytecode models.
Author supplied keywords
Cite
CITATION STYLE
Choi, Y. H., & Nam, J. (2022). On the Naturalness of Bytecode Instructions. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3551349.3559559
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.