K-PathVQA: Knowledge-Aware Multimodal Representation for Pathology Visual Question Answering

Usman Naseem; Matloob Khushi; Adam G. Dunn; Jinman Kim

Journal Article

K-PathVQA: Knowledge-Aware Multimodal Representation for Pathology Visual Question Answering

IEEE Journal of Biomedical and Health Informatics (2024) 28(4) 1886-1895

DOI: 10.1109/JBHI.2023.3294249

1Citations

4Readers

Get full text

Abstract

Pathology imaging is routinely used to detect the underlying effects and causes of diseases or injuries. Pathology visual question answering (PathVQA) aims to enable computers to answer questions about clinical visual findings from pathology images. Prior work on PathVQA has focused on directly analyzing the image content using conventional pretrained encoders without utilizing relevant external information when the image content is inadequate. In this article, we present a knowledge-driven PathVQA (K-PathVQA), which uses a medical knowledge graph (KG) from a complementary external structured knowledge base to infer answers for the PathVQA task. K-PathVQA improves the question representation with external medical knowledge and then aggregates vision, language, and knowledge embeddings to learn a joint knowledge-image-question representation. Our experiments using a publicly available PathVQA dataset showed that our K-PathVQA outperformed the best baseline method with an increase of 4.15% in accuracy for the overall task, an increase of 4.40% in open-ended question type and an absolute increase of 1.03% in closed-ended question types. Ablation testing shows the impact of each of the contributions. Generalizability of the method is demonstrated with a separate medical VQA dataset.

Author supplied keywords

Cite

CITATION STYLE

APA

Naseem, U., Khushi, M., Dunn, A. G., & Kim, J. (2024). K-PathVQA: Knowledge-Aware Multimodal Representation for Pathology Visual Question Answering. IEEE Journal of Biomedical and Health Informatics, 28(4), 1886–1895. https://doi.org/10.1109/JBHI.2023.3294249

K-PathVQA: Knowledge-Aware Multimodal Representation for Pathology Visual Question Answering

Abstract

Author supplied keywords

Cite

Register to see more suggestions