Abstract
Large language models (LLMs) are rapidly transforming various fields, including the field of business process management (BPM). LLMs provide new ways for analyzing and improving operational processes. This paper assesses the capabilities of LLMs on business process modeling using a framework for automating this task and a robust evaluation approach. We design a comprehensive benchmark, consisting of 20 diverse business processes, and we demonstrate our evaluation approach by assessing 16 current state-of-the-art LLMs from major AI vendors. Our analysis highlights significant performance variations across LLMs and reveals a positive correlation between efficient error handling and the quality of generated models. It also shows consistent performance trends within similar LLM groups. Furthermore, we use our evaluation approach to investigate LLM self-improvement techniques, encompassing self-evaluation, input optimization, and output optimization. Our findings indicate that output optimization, in particular, offers promising potential for enhancing quality, especially in models with initially lower performance. Our contributions provide insights for leveraging LLMs in BPM, paving the way for more advanced and automated process modeling techniques.
Author supplied keywords
Cite
CITATION STYLE
Kourani, H., Berti, A., Schuster, D., & van der Aalst, W. M. P. (2025). Evaluating large language models on business process modeling: framework, benchmark, and self-improvement analysis. Software and Systems Modeling. https://doi.org/10.1007/s10270-025-01318-w
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.