This study investigates the efficiency of large language models (LLMs) in producing routine, negative, and persuasive business emails for educational purposes within the context of Business Writing. Specifically, it compares the outputs generated by four widely-used LLMs (ChatGPT 3.5, Llama 2, Bing Chat, and Bard) when presented with identical email scenarios. These generated emails are evaluated using an elaborate rubric, allowing for a systematic assessment of LLMs' performance across three distinct email types. The results of the study show that the output with the same prompt varies greatly despite the rather formulaic nature of business emails. For instance, some LLMs struggle with following the requested structure and maintaining consistency in tone, while others have issues with unity and conciseness. The findings of this research hold implications for teaching business writing (rubrics, task instructions, in-class implementation), as well as for the integration of AI in professional communication at large.
CITATION STYLE
Jovic, M., & Mnasri, S. (2024). Evaluating AI-Generated Emails: A Comparative Efficiency Analysis. World Journal of English Language, 14(2), 502–517. https://doi.org/10.5430/wjel.v14n2p502
Mendeley helps you to discover research relevant for your work.