Automatic ultrasound image report generation with adaptive multimodal attention mechanism

49Citations
Citations of this article
36Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Text report writing for medical images is a fundamental task for diagnosis and treatment in clinical medicine. However, this work is tedious and time-consuming because of the special report features (e.g., boundary conditions and fixed templates). The existing works mainly adopt image captioning methods for medical report generation but the special report features are not fully considered in these models. This paper proposes an Adaptive Multimodal Attention network (AMAnet) to generate high-quality medical image reports. First, a Multi-Label Classification network is designed to predict the essential local properties. And then the word embedding vectors of these properties can serve as the semantic features to aid report generation. Second, we develop a semantic attention mechanism to imitate the spatial attention. Third, we introduce an adaptive attention mechanism with a sentinel gate to control the attention level at current visual features and language model memories when generating the next word. Experimental results demonstrate AMAnet outperforms the state-of-the-art image captioning methods with over 1 CIDEr score improvement.

Cite

CITATION STYLE

APA

Yang, S., Niu, J., Wu, J., Wang, Y., Liu, X., & Li, Q. (2021). Automatic ultrasound image report generation with adaptive multimodal attention mechanism. Neurocomputing, 427, 40–49. https://doi.org/10.1016/j.neucom.2020.09.084

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free