Parallel image captioning using 2D masked convolution

Chanrith Poleak; Jangwoo Kwon

Journal ArticleOPEN ACCESS

Parallel image captioning using 2D masked convolution

Applied Sciences (Switzerland) (2019) 9(9)

DOI: 10.3390/app9091871

3Citations

7Readers

Abstract

Automatically generating a novel description of an image is a challenging and important problem that brings together advanced research in both computer vision and natural language processing. In recent years, image captioning has significantly improved its performance by using long short-term memory (LSTM) as a decoder for the language model. However, despite this improvement, LSTM itself has its own shortcomings as a model because the structure is complicated and its nature is inherently sequential. This paper proposes a model using a simple convolutional network for both encoder and decoder functions of image captioning, instead of the current state-of-the-art approach. Our experiment with this model on a Microsoft Common Objects in Context (MSCOCO) captioning dataset yielded results that are competitive with the state-of-the-art image captioning model across different evaluation metrics, while having a much simpler model and enabling parallel graphics processing unit (GPU) computation during training, resulting in a faster training time.

Author supplied keywords

Cite

CITATION STYLE

APA

Poleak, C., & Kwon, J. (2019). Parallel image captioning using 2D masked convolution. Applied Sciences (Switzerland), 9(9). https://doi.org/10.3390/app9091871

Parallel image captioning using 2D masked convolution

Abstract

Author supplied keywords

Cite

Register to see more suggestions