Mean box pooling: A rich image representation and output embedding for the visual madlibs task

Ashkan Mokarian; Mateusz Malinowski; Mario Fritz

Conference Proceedings

Mean box pooling: A rich image representation and output embedding for the visual madlibs task

British Machine Vision Conference 2016, BMVC 2016 (2016) 2016-September 111.1-111.12

DOI: 10.5244/C.30.111

2Citations

6Readers

Get full text

Abstract

We present Mean Box Pooling, a novel visual representation that pools over CNN representations of a large number, highly overlapping object proposals. We show that such representation together with nCCA, a successful multimodal embedding technique, achieves state-of-the-art performance on the Visual Madlibs task. Moreover, inspired by the nCCA’s objective function, we extend classical CNN+LSTM approach to train the network by directly maximizing the similarity between the internal representation of the deep learning architecture and candidate answers. Again, such approach achieves a significant improvement over the prior work that also uses CNN+LSTM approach on Visual Madlibs.

Cite

CITATION STYLE

APA

Mokarian, A., Malinowski, M., & Fritz, M. (2016). Mean box pooling: A rich image representation and output embedding for the visual madlibs task. In British Machine Vision Conference 2016, BMVC 2016 (Vol. 2016-September, pp. 111.1-111.12). British Machine Vision Conference, BMVC. https://doi.org/10.5244/C.30.111

Mean box pooling: A rich image representation and output embedding for the visual madlibs task

Abstract

Cite

Register to see more suggestions