With the popularization of social websites, many methods have been proposed to explore the noisy tags for weakly-supervised image hashing.The main challenge lies in learning appropriate and sufficient information from those noisy tags. To address this issue, this work proposes a novel Masked visual-semantic Graph-based Reasoning Network, termed as MGRN, to learn joint visual-semantic representations for image hashing. Specifically, for each image, MGRN constructs a relation graph to capture the interactions among its associated tags and performs reasoning with Graph Attention Networks (GAT). MGRN randomly masks out one tag and then make GAT to predict this masked tag. This forces the GAT model to capture the dependence between the image and its associated tags, which can well address the problem of noisy tags. Thus it can capture key tags and visual structures from images to learn well-aligned visual-semantic representations. Finally, the auto-encoders is leveraged to learn hash codes that can preserve the local structure of the joint space. Meanwhile, the joint visual-semantic representations are reconstructed from those hash codes by using a decoder. Experimental results on two widely-used benchmark datasets demonstrate the superiority of the proposed method for image retrieval compared with several state-of-the-art methods.
CITATION STYLE
Jin, L., Li, Z., Pan, Y., & Tang, J. (2020). Weakly-Supervised Image Hashing through Masked Visual-Semantic Graph-based Reasoning. In MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia (pp. 916–924). Association for Computing Machinery, Inc. https://doi.org/10.1145/3394171.3414022
Mendeley helps you to discover research relevant for your work.