Medical visual question answering (Med-VQA) is very important for better clinical decision support and enhanced patient engagement in patient-centered medical care. Compared with open domain VQA tasks, VQA in medical domain becomes more challenging due to limited training resources as well as unique characteristics on medical images and domain vocabularies. In this paper, we propose and develop a novel deep transfer learning model, ETM-Trans, which exploits embedding topic modeling (ETM) on textual questions to derive topic labels to pair with associated medical images for finetuning the pre-trained ImageNet model. We also explore and implement a co-attention mechanism where residual networks is used to extract visual features from image interacting with the long-short term memory (LSTM) based question representation providing fine-grained contextual information for answer derivation. To efficiently integrate visual features from the image and textual features from the question, we employ Multimodal Factorized Bilinear (MFB) pooling as well as Multimodal Factorized High-order (MFH) pooling. The ETM-Trans model won the international Med-VQA 2018 challenge, achieving the best WBSS score of 0.186.
CITATION STYLE
Liu, F., Peng, Y., & Rosen, M. P. (2019). An Effective Deep Transfer Learning and Information Fusion Framework for Medical Visual Question Answering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11696 LNCS, pp. 238–247). Springer Verlag. https://doi.org/10.1007/978-3-030-28577-7_20
Mendeley helps you to discover research relevant for your work.