CoditT5: Pretraining for Source Code and Natural Language Editing

Jiyang Zhang; Sheena Panthaplackel; Pengyu Nie; Junyi Jessy Li; Milos Gligoric

Conference ProceedingsOPEN ACCESS

CoditT5: Pretraining for Source Code and Natural Language Editing

ACM International Conference Proceeding Series (2022)

DOI: 10.1145/3551349.3556955

66Citations

44Readers

Get full text

Abstract

Pretrained language models have been shown to be effective in many software-related generation tasks; however, they are not well-suited for editing tasks as they are not designed to reason about edits. To address this, we propose a novel pretraining objective which explicitly models edits and use it to build CoditT5, a large language model for software-related editing tasks that is pretrained on large amounts of source code and natural language comments. We fine-tune it on various downstream editing tasks, including comment updating, bug fixing, and automated code review. By outperforming standard generation-based models, we demonstrate the generalizability of our approach and its suitability for editing tasks. We also show how a standard generation model and our edit-based model can complement one another through simple reranking strategies, with which we achieve state-of-the-art performance for the three downstream editing tasks.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhang, J., Panthaplackel, S., Nie, P., Li, J. J., & Gligoric, M. (2022). CoditT5: Pretraining for Source Code and Natural Language Editing. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3551349.3556955

CoditT5: Pretraining for Source Code and Natural Language Editing

Abstract

Author supplied keywords

Cite

Register to see more suggestions