Evaluation of the accuracy and safety of machine translation of patient-specific discharge instructions: a comparative analysis

16Citations
Citations of this article
47Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Introduction Machine translation of patient-specific information could mitigate language barriers if sufficiently accurate and non-harmful and may be particularly useful in healthcare encounters when professional translators are not readily available. We evaluated the translation accuracy and potential for harm of ChatGPT-4 and Google Translate in translating from English to Spanish, Chinese and Russian. Methods We used ChatGPT-4 and Google Translate to translate 50 sets (316 sentences) of deidentified, patient-specific, clinician free-text emergency department instructions into Spanish, Chinese and Russian. These were then back-translated into English by professional translators and double-coded by physicians for accuracy and potential for clinical harm. Results At the sentence level, we found that both tools were ≥90% accurate in translating English to Spanish (accuracy: GPT 97%, Google Translate 96%) and English to Chinese (accuracy: GPT 95%; Google Translate 90%); neither tool performed as well in translating English to Russian (accuracy: GPT 89%; Google Translate 80%). At the instruction set level, 16%, 24% and 56% of Spanish, Chinese and Russian GPT-translated instruction sets contained at least one inaccuracy. For Google Translate, 24%, 56% and 66% of Spanish, Chinese and Russian translations contained at least one inaccuracy. The potential for harm due to inaccurate translations was ≤1% for both tools in all languages at the sentence level and ≤6% at the instruction set level. GPT was significantly more accurate than Google Translate in Chinese and Russian at the sentence level; the potential for harm was similar. Conclusion These results support the potential of machine translation tools to mitigate gaps in translation services for low-stakes written communication from English to Spanish, while also strengthening the case for caution and for professional oversight in non-low-risk communication. Further research is needed to evaluate machine translation for other languages and more technical content.

Cite

CITATION STYLE

APA

Kong, M., Fernandez, A., Bains, J., Milisavljevic, A., Brooks, K. C., Shanmugam, A., … Khoong, E. C. (2025). Evaluation of the accuracy and safety of machine translation of patient-specific discharge instructions: a comparative analysis. BMJ Quality and Safety. https://doi.org/10.1136/bmjqs-2024-018384

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free