Multiple instance visual-semantic embedding

22Citations
Citations of this article
68Readers
Mendeley users who have this article in their library.

Abstract

Visual-semantic embedding models have been recently proposed and shown to be effective for image classification and zero-shot learning. The key idea is that by directly learning a mapping from images into a semantic label space, the algorithm can generalize to a large number of unseen labels. However, existing approaches are limited to single-label embedding, handling images with multiple labels still remains an open problem, mainly due to the complex underlying correspondence between an image and its labels. In this work, we present a novel Multiple Instance Visual-Semantic Embedding (MIVSE) model for multi-label images. Instead of embedding a whole image into the semantic space, our model characterizes the subregion-to-label correspondence, which discovers and maps semantically meaningful image subregions to the corresponding labels. Experiments on two challenging tasks, multi-label image annotation and zero-shot learning, show that the proposed MIVSE model outperforms state-of-the-art methods on both tasks and possesses the ability of generalizing to unseen labels.

Cite

CITATION STYLE

APA

Ren, Z., Jin, H., Lin, Z., Fang, C., & Yuille, A. (2017). Multiple instance visual-semantic embedding. In British Machine Vision Conference 2017, BMVC 2017. BMVA Press. https://doi.org/10.5244/c.31.89

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free