PAIC: Parallelised Attentive Image Captioning

Ziwei Wang; Zi Huang; Yadan Luo

Conference Proceedings

PAIC: Parallelised Attentive Image Captioning

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12008 LNCS 16-28

DOI: 10.1007/978-3-030-39469-1_2

1Citations

1Readers

Get full text

Abstract

Most encoder-decoder architectures generate the image description sentence based on the recurrent neural networks (RNN). However, the RNN decoder trained by Back Propagation Through Time (BPTT) is inherently time-consuming, accompanied by the gradient vanishing problem. To overcome these difficulties, we propose a novel Parallelised Attentive Image Captioning Model (PAIC) that purely employs the optimised attention mechanism to decode natural sentences without using RNNs. At each decoding phase, our model can precisely localise different areas of image utilising the well-defined spatial attention module, meanwhile capturing the word sequence powered by the well-attested multi-head self-attention model. In contrast to the RNNs, the proposed PAIC can efficiently exploit the parallel computation advantages of GPU hardware for training, and further facilitate the gradient propagation. Extensive experiments on MS-COCO demonstrate that the proposed PAIC significantly reduces the training time, while achieving competitive performance compared to conventional RNN-based models.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, Z., Huang, Z., & Luo, Y. (2020). PAIC: Parallelised Attentive Image Captioning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12008 LNCS, pp. 16–28). Springer. https://doi.org/10.1007/978-3-030-39469-1_2

PAIC: Parallelised Attentive Image Captioning

Abstract

Author supplied keywords

Cite

Register to see more suggestions