Exploring memory and time efficient neural networks for image captioning

2Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Automatically describing the contents of an image is one of the fundamental problems in artificial intelligence. Recent research has primarily focussed on improving the quality of the generated descriptions. It is possible to construct multiple architectures that achieve equivalent performance for the same task. Among these, the smaller architecture is desirable as they require less communication across servers during distributed training and less bandwidth to export a new model from one place to another through a network. Generally, a deep learning architecture for image captioning consists of a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN) clubbed together within an encoder-decoder framework. We propose to combine a significantly smaller CNN architecture termed SqueezeNet and a memory and computation efficient LightRNN within a visual attention framework. Experimental evaluation of the proposed architecture on Flickr8k, Flickr30k and MS-COCO datasets reveal superior result when compared to the state of the art.

Cite

CITATION STYLE

APA

Parameswaran, S. N. (2018). Exploring memory and time efficient neural networks for image captioning. In Communications in Computer and Information Science (Vol. 841, pp. 338–347). Springer Verlag. https://doi.org/10.1007/978-981-13-0020-2_30

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free