Fold2Vec: Towards a Statement-Based Representation of Code for Code Comprehension

Francesco Bertolotti; Walter Cazzola

Journal ArticleOPEN ACCESS

Fold2Vec: Towards a Statement-Based Representation of Code for Code Comprehension

ACM Transactions on Software Engineering and Methodology (2023) 32(1)

DOI: 10.1145/3514232

6Citations

23Readers

Abstract

We introduce a novel approach to source code representation to be used in combination with neural networks. Such a representation is designed to permit the production of a continuous vector for each code statement. In particular, we present how the representation is produced in the case of Java source code. We test our representation for three tasks: code summarization, statement separation, and code search. We compare with the state-of-the-art non-autoregressive and end-to-end models for these tasks. We conclude that all tasks benefit from the proposed representation to boost their performance in terms of F1-score, accuracy, and mean reciprocal rank, respectively. Moreover, we show how models trained on code summarization and models trained on statement separation can be combined to address methods with tangled responsibilities, meaning that these models can be used to detect code misconduct.

Author supplied keywords

Cite

CITATION STYLE

APA

Bertolotti, F., & Cazzola, W. (2023). Fold2Vec: Towards a Statement-Based Representation of Code for Code Comprehension. ACM Transactions on Software Engineering and Methodology, 32(1). https://doi.org/10.1145/3514232

Fold2Vec: Towards a Statement-Based Representation of Code for Code Comprehension

Abstract

Author supplied keywords

Cite

Register to see more suggestions