Abstract
Automatic code summarization, which aims to describe the source code in natural language, has become an essential task in software maintenance. Our fellow researchers have attempted to achieve such a purpose through various machine learning-based approaches. One key challenge keeping these approaches from being practical lies in the lacking of retaining the semantic structure of source code, which has unfortunately been overlooked by the state-of-the-art methods. Existing approaches resort to representing the syntax structure of code by modeling the Abstract Syntax Trees (ASTs). However, the hierarchical structures of ASTs have not been well explored. In this paper, we propose CODESCRIBE to model the hierarchical syntax structure of code by introducing a novel triplet position for code summarization. Specifically, CODESCRIBE leverages the graph neural network and Transformer to preserve the structural and sequential information of code, respectively. In addition, we propose a pointer-generator network that pays attention to both the structure and sequential tokens of code for a better summary generation. Experiments on two real-world datasets in Java and Python demonstrate the effectiveness of our proposed approach when compared with several state-of-the-art baselines.
Cite
CITATION STYLE
Guo, J., Liu, J., Wan, Y., Li, L., & Zhou, P. (2022). Modeling Hierarchical Syntax Structure with Triplet Position for Source Code Summarization. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 486–500). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.37
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.