Abstract
Until now, automation of maintenance recommendations for Prognostics and Health Management (PHM) has been a domain-specific technical language processing (TLP) task applied to historical case data. ChatGPT, Bard, GPT-4 and Sydney are a few examples of generative large language models (LLMs) that have received significant media attention for their proficiency in natural language tasks across a variety of domains. Preliminary exploration of ChatGPT as a tool for generating maintenance recommendations has shown promise in its ability to generate and explain engineering concepts and procedures, but the precise scope of its capabilities and limitations remains uncertain. Currently we know of no performance criteria related to formally measuring how well ChatGPT performs as a tool for industrial use cases. In this paper, we propose a methodology for the evaluation of the performance of LLMs such as ChatGPT for the task of automation of maintenance recommendations. Our methodology identifies various performance criteria relevant for PHM such as engineering criteria, risk elements, human factors, cost considerations and corrections. We examine how well ChatGPT performs when tasked with generating recommendations from PHM model alerts and report our findings. We discuss the various strengths and limitations to consider in the adoption of LLM's as a computational support tool for prescriptive PHM as well as the different risks and business case considerations.
Cite
CITATION STYLE
Lukens, S., & Ali, A. (2023). Evaluating the Performance of ChatGPT in the Automation of Maintenance Recommendations for Prognostics and Health Management. In Proceedings of the Annual Conference of the Prognostics and Health Management Society, PHM (Vol. 15). Prognostics and Health Management Society. https://doi.org/10.36001/phmconf.2023.v15i1.3487
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.