Ranking LLM-Generated Loop Invariants for Program Verification

2Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

Abstract

Synthesizing inductive loop invariants is fundamental to automating program verification. In this work, we observe that Large Language Models (such as gpt-3.5 or gpt-4) are capable of synthesizing loop invariants for a class of programs in a 0-shot setting, yet require several samples to generate the correct invariants. This can lead to a large number of calls to a program verifier to establish an invariant. To address this issue, we propose a re-ranking approach for the generated results of LLMs. We have designed a ranker that can distinguish between correct inductive invariants and incorrect attempts based on the problem definition. The ranker is optimized as a contrastive ranker. Experimental results demonstrate that this re-ranking mechanism significantly improves the ranking of correct invariants among the generated candidates, leading to a notable reduction in the number of calls to a verifier.

Cite

CITATION STYLE

APA

Chakraborty, S., Lahiri, S. K., Fakhoury, S., Musuvathi, M., Lal, A., Rastogi, A., … Swamy, N. (2023). Ranking LLM-Generated Loop Invariants for Program Verification. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 9164–9175). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-emnlp.614

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free