Automatic ultrasound image report generation with adaptive multimodal attention mechanism

Shaokang Yang; Jianwei Niu; Jiyan Wu; Yong Wang; Xuefeng Liu; Qingfeng Li

Journal Article

Automatic ultrasound image report generation with adaptive multimodal attention mechanism

Neurocomputing (2021) 427 40-49

DOI: 10.1016/j.neucom.2020.09.084

49Citations

36Readers

Get full text

Abstract

Text report writing for medical images is a fundamental task for diagnosis and treatment in clinical medicine. However, this work is tedious and time-consuming because of the special report features (e.g., boundary conditions and fixed templates). The existing works mainly adopt image captioning methods for medical report generation but the special report features are not fully considered in these models. This paper proposes an Adaptive Multimodal Attention network (AMAnet) to generate high-quality medical image reports. First, a Multi-Label Classification network is designed to predict the essential local properties. And then the word embedding vectors of these properties can serve as the semantic features to aid report generation. Second, we develop a semantic attention mechanism to imitate the spatial attention. Third, we introduce an adaptive attention mechanism with a sentinel gate to control the attention level at current visual features and language model memories when generating the next word. Experimental results demonstrate AMAnet outperforms the state-of-the-art image captioning methods with over 1 CIDEr score improvement.

Author supplied keywords

Cite

CITATION STYLE

APA

Yang, S., Niu, J., Wu, J., Wang, Y., Liu, X., & Li, Q. (2021). Automatic ultrasound image report generation with adaptive multimodal attention mechanism. Neurocomputing, 427, 40–49. https://doi.org/10.1016/j.neucom.2020.09.084

Automatic ultrasound image report generation with adaptive multimodal attention mechanism

Abstract

Author supplied keywords

Cite

Register to see more suggestions