Image semantic description based on deep learning with multi-attention mechanisms

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In the era of big data, cross-media and multi-modal data are expanding, and data processing methods fail to meet corresponding functional requirements. Aiming at the characteristic of large expression gap of multi-model data, This paper proposes a multimodal data fusion method based on deep learning, which combines the advantages of deep learning in the field of image detection, text sequence prediction, and the multi-attention mechanism. The BLEU algorithm is used to calculate the similarity of four levels of description statements of model output and image. Training and testing were conducted in the Flickr8K data set. Comparing with the traditional single mode state image description method, the experiments show that under the BLEU index, the multi-AM model can achieve better results.

Cite

CITATION STYLE

APA

Yang, J., & Meng, Z. Q. (2018). Image semantic description based on deep learning with multi-attention mechanisms. In IFIP Advances in Information and Communication Technology (Vol. 538, pp. 356–362). Springer New York LLC. https://doi.org/10.1007/978-3-030-00828-4_36

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free