Background: A high number of patients seek health information online, and large language models (LLMs) may produce a rising amount of it. Aim: This study evaluates the performance regarding health information provided by ChatGPT, a LLM developed by OpenAI, focusing on its utility as a source for otolaryngology-related patient information. Material and method: A variety of doctors from a tertiary otorhinolaryngology department used a Likert scale to assess the chatbot’s responses in terms of accuracy, relevance, and depth. The responses were also evaluated by ChatGPT. Results: The composite mean of the three categories was 3.41, with the highest performance noted in the relevance category (mean = 3.71) when evaluated by the respondents. The accuracy and depth categories yielded mean scores of 3.51 and 3.00, respectively. All the categories were rated as 5 when evaluated by ChatGPT. Conclusion and significance: Despite its potential in providing relevant and accurate medical information, the chatbot’s responses lacked depth and were found to potentially perpetuate biases due to its training on publicly available text. In conclusion, while LLMs show promise in healthcare, further refinement is necessary to enhance response depth and mitigate potential biases.
CITATION STYLE
Nielsen, J. P. S., von Buchwald, C., & Grønhøj, C. (2023). Validity of the large language model ChatGPT (GPT4) as a patient information source in otolaryngology by a variety of doctors in a tertiary otorhinolaryngology department. Acta Oto-Laryngologica. Taylor and Francis Ltd. https://doi.org/10.1080/00016489.2023.2254809
Mendeley helps you to discover research relevant for your work.