Evaluating large language models on business process modeling: framework, benchmark, and self-improvement analysis

5Citations
Citations of this article
62Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Large language models (LLMs) are rapidly transforming various fields, including the field of business process management (BPM). LLMs provide new ways for analyzing and improving operational processes. This paper assesses the capabilities of LLMs on business process modeling using a framework for automating this task and a robust evaluation approach. We design a comprehensive benchmark, consisting of 20 diverse business processes, and we demonstrate our evaluation approach by assessing 16 current state-of-the-art LLMs from major AI vendors. Our analysis highlights significant performance variations across LLMs and reveals a positive correlation between efficient error handling and the quality of generated models. It also shows consistent performance trends within similar LLM groups. Furthermore, we use our evaluation approach to investigate LLM self-improvement techniques, encompassing self-evaluation, input optimization, and output optimization. Our findings indicate that output optimization, in particular, offers promising potential for enhancing quality, especially in models with initially lower performance. Our contributions provide insights for leveraging LLMs in BPM, paving the way for more advanced and automated process modeling techniques.

Cite

CITATION STYLE

APA

Kourani, H., Berti, A., Schuster, D., & van der Aalst, W. M. P. (2025). Evaluating large language models on business process modeling: framework, benchmark, and self-improvement analysis. Software and Systems Modeling. https://doi.org/10.1007/s10270-025-01318-w

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free