BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis

7Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Vision-and-language (V & L ) models take image and text as input and learn to capture the associations between them. These models can potentially deal with the tasks that involve understanding medical images along with their associated text. However, applying V & L models in the medical domain is challenging due to the expensiveness of data annotations and the requirements of domain knowledge. In this paper, we identify that the visual representation in general V & L models is not suitable for processing medical data. To overcome this limitation, we propose BERTHop, a transformer-based model based on PixelHop++ and VisualBERT for better capturing the associations between clinical notes and medical images. Experiments on the OpenI dataset, a commonly used thoracic disease diagnosis benchmark, show that BERTHop achieves an average Area Under the Curve (AUC) of 98.12% which is 1.62% higher than state-of-the-art while it is trained on a 9 × smaller dataset (https://github.com/monajati/BERTHop ).

Cite

CITATION STYLE

APA

Monajatipoor, M., Rouhsedaghat, M., Li, L. H., Jay Kuo, C. C., Chien, A., & Chang, K. W. (2022). BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13435 LNCS, pp. 725–734). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-16443-9_69

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free