CaptionNet: Automatic End-to-End Siamese Difference Captioning Model with Attention

Ariyo Oluwasanmi; Muhammad Umar Aftab; Eatedal Alabdulkreem; Bulbula Kumeda; Edward Y. Baagyere; Zhiquang Qin

Journal ArticleOPEN ACCESS

CaptionNet: Automatic End-to-End Siamese Difference Captioning Model with Attention

IEEE Access (2019) 7 106773-106783

DOI: 10.1109/ACCESS.2019.2931223

34Citations

29Readers

Abstract

Several deep learning techniques have been intensively reviewed for captioning tasks, enabling the possibility of textual understanding, and description of both simple and complex images. In advancing this knowledge, this paper proposes a multimodal end-to-end siamese difference captioning model (SDCM) to automatically generate a natural language description of differences in an image pair. The proposed supervised learning model combines several deep learning techniques in exploring the practicability of capturing, aligning, and computing the disparities between two image features, for the purpose of creating corresponding language model probability distribution. First, a deep siamese convolutional neural network is used to extract the feature vector discrepancies of an image pair, and then an attention mechanism enables the detection of salient regions of the feature vector which effectively allows a bidirectional long short-term memory decoder to generate a matching and semantically associated textual sequence. The evaluation of the model is tested on the spot-the-diff baseline dataset which consists of pairs of images and their equivalent captions. The results indicate that our proposed model demonstrates a highly competitive performance in comparison to the state of the art.

Author supplied keywords

Cite

CITATION STYLE

APA

Oluwasanmi, A., Aftab, M. U., Alabdulkreem, E., Kumeda, B., Baagyere, E. Y., & Qin, Z. (2019). CaptionNet: Automatic End-to-End Siamese Difference Captioning Model with Attention. IEEE Access, 7, 106773–106783. https://doi.org/10.1109/ACCESS.2019.2931223

CaptionNet: Automatic End-to-End Siamese Difference Captioning Model with Attention

Abstract

Author supplied keywords

Cite

Register to see more suggestions