In this paper, we present HuatuoGPT, a Large Language Model (LLM) for medical consultation. The core recipe of HuatuoGPT is to leverage both distilled data from ChatGPT and real-world data from doctors in the supervised fine-tuning stage. This is not only because purely using ChatGPT-distilled data might cause 'model collapse', but also because real-world data from doctors would be complementary to ChatGPT-distilled data. The responses from ChatGPT are usually detailed, well-presented, fluent, and instruction-followed, but it cannot perform like a doctor in many aspects, e.g. for interactive diagnosis. Therefore, the extra doctors' data could tame a distilled language model to perform like doctors. To synergize the strengths of both data sources, we introduce RLMF (Reinforcement Learning from Mixed Feedback) where a reward model is trained to align the language model with the merits that both sources (ChatGPT and doctors) bring. Experimental results (in GPT-4 evaluation, human evaluation, and medical benchmark datasets) demonstrate that HuatuoGPT achieves state-of-the-art results in performing medical consultation among open-source LLMs. It is worth noting that by using additional real-world data and RLMF, the distilled language model (i.e., HuatuoGPT) outperforms its teacher model (i.e., ChatGPT) in most cases.
CITATION STYLE
Zhang, H., Chen, J., Jiang, F., Yu, F., Chen, Z., Li, J., … Li, H. (2023). HuatuoGPT, Towards Taming Language Models To Be a Doctor. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 10859–10885). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-emnlp.725
Mendeley helps you to discover research relevant for your work.