Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder

1Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Medical Visual Question Answering (VQA) systems play a supporting role to understand clinic-relevant information carried by medical images. The questions to a medical image include two categories: close-end (such as Yes/No question) and open-end. To obtain answers, the majority of the existing medical VQA methods rely on classification approaches, while a few works attempt to use generation approaches or a mixture of the two to process the two kinds of questions separately (classification for the close-end and generation for the open-end). The classification approaches are relatively simple but perform poorly on long open-end questions, while the generation approaches face the challenge of generating many non-existent answers, resulting in low accuracy rates. To bridge this gap, in this paper, we propose a new Transformer based framework for medical VQA (named as Q2ATransformer), which integrates the advantages of both the classification and the generation approaches and provides a unified treatment for the close-end and open-end questions. Specifically, we introduce an additional Transformer decoder with a set of learnable candidate answer embeddings to query the existence of each answer class to a given image-question pair. Through the Transformer attention, the candidate answer embeddings interact with the fused features of the image-question pair to make the decision. In this way, despite being a classification-based approach, our method provides a mechanism to interact with the answer information for prediction like the generation-based approaches. On the other hand, by classification, we mitigate the task difficulty by reducing the search space of answers. Our method achieves new state-of-the-art performance on two medical VQA benchmarks. Especially, for the open-end questions, we achieve 79.19% on VQA-RAD and 54.85% on PathVQA, with 16.09% and 41.45% absolute improvements, respectively.

Cite

CITATION STYLE

APA

Liu, Y., Wang, Z., Xu, D., & Zhou, L. (2023). Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13939 LNCS, pp. 445–456). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-34048-2_34

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free