CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training

0Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A large-scale image-text pair dataset has greatly contributed to the development of vision-language pre-training (VLP) models, which enable zero-shot or few-shot classification without costly annotation. However, in the medical domain, the scarcity of data remains a significant challenge for developing a powerful VLP model. In this paper, we tackle the lack of image-text data in chest X-ray by expanding image-label pair as image-text pair via general prompt and utilizing multiple images and multiple sections in a radiologic report. We also design two contrastive losses, named ICL and TCL, for learning study-level characteristics of medical images and reports, respectively. Our model outperforms the state-of-the-art models trained under the same conditions. Also, enlarged dataset improve the discriminative power of our pre-trained model for classification, while sacrificing marginal retrieval performance. Code is available at https://github.com/kakaobrain/cxr-clip.

Cite

CITATION STYLE

APA

You, K., Gu, J., Ham, J., Park, B., Kim, J., Hong, E. K., … Roh, B. (2023). CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 14221 LNCS, pp. 101–111). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-43895-0_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free