Harnessing the Power of Pre-trained Vision-Language Models for Efficient Medical Report Generation

Qi Li

Conference ProceedingsOPEN ACCESS

Harnessing the Power of Pre-trained Vision-Language Models for Efficient Medical Report Generation

Li Q

International Conference on Information and Knowledge Management, Proceedings (2023) 1308-1317

DOI: 10.1145/3583780.3614961

18Citations

11Readers

Get full text

Abstract

Medical images are commonly used in clinical practice. But the need for diagnosis and reporting from image-based examinations far excels the current medical capacity. Automatic Medical Report Generation (MRG) can help to ease the burden of radiologists. Vision-Language Pre-training (VLP) has received tremendous success on various tasks, therefore it is naturally expected that MRG can harvest from this rapid advancement. However, directly applying existing VLP models in the medical domain is impracticable due to their data-hungry nature, the need for aligning different modalities, prohibitive training time, exorbitant hardware barrier, and the challenge of open-ended text generation. To address these problems, we propose MedEPT, a parameter-efficient approach for MRG that can utilize ever-ignored image-only datasets. It employs parameter-efficient tuning (PET) for VLP adaption to mitigate inefficiency in fine-tuning time and hardware. MedEPT also employs MRGPID to augment and expand adaption datasets by synthesizing meaningful text for image-only datasets. We perform a systematic evaluation of our method. Empirical results show that we obtain a better performance than the state-of-the-art method while using less than 10% trainable parameters and not more than 30% training time than ever before.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, Q. (2023). Harnessing the Power of Pre-trained Vision-Language Models for Efficient Medical Report Generation. In International Conference on Information and Knowledge Management, Proceedings (pp. 1308–1317). Association for Computing Machinery. https://doi.org/10.1145/3583780.3614961

Harnessing the Power of Pre-trained Vision-Language Models for Efficient Medical Report Generation

Abstract

Author supplied keywords

Cite

Register to see more suggestions