On the Naturalness of Bytecode Instructions

Yoon Ho Choi; Jaechang Nam

Conference ProceedingsOPEN ACCESS

On the Naturalness of Bytecode Instructions

ACM International Conference Proceeding Series (2022)

DOI: 10.1145/3551349.3559559

1Citations

5Readers

Get full text

Abstract

Bytecode is used in software analysis and other approaches due to its advantages such as high availability and simple specification. Therefore, to leverage these advantages in training language models with bytecode, it is important to clearly recognize the characteristics of the naturalness of bytecode. However, the naturalness of bytecode has not been actively explored. In this paper, we experimentally show the naturalness of bytecode instructions and investigate their characteristics by empirically assessing 10 Java open-source projects. Consequently, we demonstrate that the bytecode instructions are more natural than source code representations and less natural than abstract syntax tree representations at a method-level. Furthermore, we found that there is no correlation between the naturalness of bytecode instructions and source code representations at a method-level. Our study supports that researchers need to deal with the characteristics of the naturalness of bytecode instructions in a different view from source code. We expect that these findings will be helpful for future work to study automated software engineering tasks such as automated debugging and vulnerability detection that use bytecode models.

Author supplied keywords

Cite

CITATION STYLE

APA

Choi, Y. H., & Nam, J. (2022). On the Naturalness of Bytecode Instructions. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3551349.3559559

On the Naturalness of Bytecode Instructions

Abstract

Author supplied keywords

Cite

Register to see more suggestions