Abstract
Text report writing for medical images is a fundamental task for diagnosis and treatment in clinical medicine. However, this work is tedious and time-consuming because of the special report features (e.g., boundary conditions and fixed templates). The existing works mainly adopt image captioning methods for medical report generation but the special report features are not fully considered in these models. This paper proposes an Adaptive Multimodal Attention network (AMAnet) to generate high-quality medical image reports. First, a Multi-Label Classification network is designed to predict the essential local properties. And then the word embedding vectors of these properties can serve as the semantic features to aid report generation. Second, we develop a semantic attention mechanism to imitate the spatial attention. Third, we introduce an adaptive attention mechanism with a sentinel gate to control the attention level at current visual features and language model memories when generating the next word. Experimental results demonstrate AMAnet outperforms the state-of-the-art image captioning methods with over 1 CIDEr score improvement.
Author supplied keywords
Cite
CITATION STYLE
Yang, S., Niu, J., Wu, J., Wang, Y., Liu, X., & Li, Q. (2021). Automatic ultrasound image report generation with adaptive multimodal attention mechanism. Neurocomputing, 427, 40–49. https://doi.org/10.1016/j.neucom.2020.09.084
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.