Evaluating the quality of machine generated open-ended texts is a long-standing challenge in Natural Language Processing (NLP). Even though there have been dramatic advancements in the machine learning technologies that propelled the research work concerning Natural Language Generation (NLG), a subdivision of NLP that focuses on text generation, a promising and widely adopted automatic evaluation technique for NLG tasks is yet to be developed. In this paper, we propose leveraging conversational Large Language Models (LLMs) as automatic evaluators for several open-ended NLG tasks. Our experiments with a recently released conversational LLM named ChatGPT demonstrate the viability of our proposal.
CITATION STYLE
Riyadh, M., & Shafiq, M. O. (2023). Towards Automatic Evaluation of NLG Tasks Using Conversational Large Language Models. In IFIP Advances in Information and Communication Technology (Vol. 676 IFIP, pp. 425–437). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-34107-6_34
Mendeley helps you to discover research relevant for your work.