Mean box pooling: A rich image representation and output embedding for the visual madlibs task

2Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

We present Mean Box Pooling, a novel visual representation that pools over CNN representations of a large number, highly overlapping object proposals. We show that such representation together with nCCA, a successful multimodal embedding technique, achieves state-of-the-art performance on the Visual Madlibs task. Moreover, inspired by the nCCA’s objective function, we extend classical CNN+LSTM approach to train the network by directly maximizing the similarity between the internal representation of the deep learning architecture and candidate answers. Again, such approach achieves a significant improvement over the prior work that also uses CNN+LSTM approach on Visual Madlibs.

Cite

CITATION STYLE

APA

Mokarian, A., Malinowski, M., & Fritz, M. (2016). Mean box pooling: A rich image representation and output embedding for the visual madlibs task. In British Machine Vision Conference 2016, BMVC 2016 (Vol. 2016-September, pp. 111.1-111.12). British Machine Vision Conference, BMVC. https://doi.org/10.5244/C.30.111

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free