Fine Tuning Large Language Model for Secure Code Generation

Junjie Li; Aseem Sangalay; Cheng Cheng; Yuan Tian; Jinqiu Yang

Conference ProceedingsOPEN ACCESS

Fine Tuning Large Language Model for Secure Code Generation

Proceedings - 2024 IEEE/ACM 1st International Conference on AI Foundation Models and Software Engineering, FORGE 2024 (2024) 86-90

DOI: 10.1145/3650105.3652299

19Citations

41Readers

Get full text

Abstract

AI pair programmers, such as GitHub's Copilot, have shown great success in automatic code generation. However, such large language model-based code generation techniques face the risk of introducing security vulnerabilities to codebases. In this work, we explore the direction of fine-tuning large language models for generating more secure code. We use real-world vulnerability fixes as our fine-tuning dataset. We craft a code-generation scenario dataset (C/C++) for evaluating and comparing the pre-trained and fine-tuned models. Our experiments on GPT-J show that the fine-tuned GPT-J achieved 70.4% and 64.5% ratios of non-vulnerable code generation for C and C++, respectively, which has a 10% increase for C and a slight increase for C++ compared with the pre-trained large language model.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, J., Sangalay, A., Cheng, C., Tian, Y., & Yang, J. (2024). Fine Tuning Large Language Model for Secure Code Generation. In Proceedings - 2024 IEEE/ACM 1st International Conference on AI Foundation Models and Software Engineering, FORGE 2024 (pp. 86–90). Association for Computing Machinery, Inc. https://doi.org/10.1145/3650105.3652299

Fine Tuning Large Language Model for Secure Code Generation

Abstract

Author supplied keywords

Cite

Register to see more suggestions