Exploring memory and time efficient neural networks for image captioning

Sandeep Narayan Parameswaran

Conference Proceedings

Exploring memory and time efficient neural networks for image captioning

Parameswaran S

Communications in Computer and Information Science (2018) 841 338-347

DOI: 10.1007/978-981-13-0020-2_30

2Citations

3Readers

Get full text

Abstract

Automatically describing the contents of an image is one of the fundamental problems in artificial intelligence. Recent research has primarily focussed on improving the quality of the generated descriptions. It is possible to construct multiple architectures that achieve equivalent performance for the same task. Among these, the smaller architecture is desirable as they require less communication across servers during distributed training and less bandwidth to export a new model from one place to another through a network. Generally, a deep learning architecture for image captioning consists of a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN) clubbed together within an encoder-decoder framework. We propose to combine a significantly smaller CNN architecture termed SqueezeNet and a memory and computation efficient LightRNN within a visual attention framework. Experimental evaluation of the proposed architecture on Flickr8k, Flickr30k and MS-COCO datasets reveal superior result when compared to the state of the art.

Cite

CITATION STYLE

APA

Parameswaran, S. N. (2018). Exploring memory and time efficient neural networks for image captioning. In Communications in Computer and Information Science (Vol. 841, pp. 338–347). Springer Verlag. https://doi.org/10.1007/978-981-13-0020-2_30

Exploring memory and time efficient neural networks for image captioning

Abstract

Cite

Register to see more suggestions