K-PathVQA: Knowledge-Aware Multimodal Representation for Pathology Visual Question Answering

1Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Pathology imaging is routinely used to detect the underlying effects and causes of diseases or injuries. Pathology visual question answering (PathVQA) aims to enable computers to answer questions about clinical visual findings from pathology images. Prior work on PathVQA has focused on directly analyzing the image content using conventional pretrained encoders without utilizing relevant external information when the image content is inadequate. In this article, we present a knowledge-driven PathVQA (K-PathVQA), which uses a medical knowledge graph (KG) from a complementary external structured knowledge base to infer answers for the PathVQA task. K-PathVQA improves the question representation with external medical knowledge and then aggregates vision, language, and knowledge embeddings to learn a joint knowledge-image-question representation. Our experiments using a publicly available PathVQA dataset showed that our K-PathVQA outperformed the best baseline method with an increase of 4.15% in accuracy for the overall task, an increase of 4.40% in open-ended question type and an absolute increase of 1.03% in closed-ended question types. Ablation testing shows the impact of each of the contributions. Generalizability of the method is demonstrated with a separate medical VQA dataset.

Cite

CITATION STYLE

APA

Naseem, U., Khushi, M., Dunn, A. G., & Kim, J. (2024). K-PathVQA: Knowledge-Aware Multimodal Representation for Pathology Visual Question Answering. IEEE Journal of Biomedical and Health Informatics, 28(4), 1886–1895. https://doi.org/10.1109/JBHI.2023.3294249

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free