Parallel image captioning using 2D masked convolution

3Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

Automatically generating a novel description of an image is a challenging and important problem that brings together advanced research in both computer vision and natural language processing. In recent years, image captioning has significantly improved its performance by using long short-term memory (LSTM) as a decoder for the language model. However, despite this improvement, LSTM itself has its own shortcomings as a model because the structure is complicated and its nature is inherently sequential. This paper proposes a model using a simple convolutional network for both encoder and decoder functions of image captioning, instead of the current state-of-the-art approach. Our experiment with this model on a Microsoft Common Objects in Context (MSCOCO) captioning dataset yielded results that are competitive with the state-of-the-art image captioning model across different evaluation metrics, while having a much simpler model and enabling parallel graphics processing unit (GPU) computation during training, resulting in a faster training time.

Cite

CITATION STYLE

APA

Poleak, C., & Kwon, J. (2019). Parallel image captioning using 2D masked convolution. Applied Sciences (Switzerland), 9(9). https://doi.org/10.3390/app9091871

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free