Discovering Connotations as Labels for Weakly Supervised Image-Sentence Data

1Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Growth of multimodal content on the web and social media has generated abundant weakly aligned image-sentence pairs. However, it is hard to interpret them directly due to intrinsic intension. In this paper, we aim to annotate such image-sentence pairs with connotations as labels to capture the intrinsic intension. We achieve it with a connotation multimodal embedding model (CMEM) using a novel loss function. It's unique characteristics over previous models include: (i) the exploitation of multimodal data as opposed to only visual information, (ii) robustness to outlier labels in a multi-label scenario and (iii) works effectively with large-scale weakly supervised data. With extensive quantitative evaluation, we exhibit the effectiveness of CMEM for detection of multiple labels over other state-of-the-art approaches. Also, we show that in addition to annotation of image-sentence pairs with connotation labels, byproduct of our model inherently supports cross-modal retrieval i.e. image query - sentence retrieval.

Cite

CITATION STYLE

APA

Mogadala, A., Kanuparthi, B., Rettinger, A., & Sure-Vetter, Y. (2018). Discovering Connotations as Labels for Weakly Supervised Image-Sentence Data. In The Web Conference 2018 - Companion of the World Wide Web Conference, WWW 2018 (pp. 379–386). Association for Computing Machinery, Inc. https://doi.org/10.1145/3184558.3186352

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free