DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning

Zhuo Chen; Yufeng Huang; Jiaoyan Chen; Yuxia Geng; Wen Zhang; Yin Fang; Jeff Z. Pan; Huajun Chen

Conference ProceedingsOPEN ACCESS

DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning

Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023 (2023) 37 405-413

DOI: 10.1609/aaai.v37i1.25114

71Citations

22Readers

Abstract

Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during training. As annotations for class-level visual characteristics, attributes are widely used semantic information for zero-shot image classification. However, the current methods often fail to discriminate those subtle visual distinctions between images due to not only the lack of fine-grained annotations, but also the issues of attribute imbalance and co-occurrence. In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pre-trained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model’s capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model’s discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multitask learning policy for considering multi-model objectives. We find that DUET can achieve state-of-the-art performance on three standard ZSL benchmarks and a knowledge graph equipped ZSL benchmark, and that its components are effective and its predictions are interpretable.

Cite

CITATION STYLE

APA

Chen, Z., Huang, Y., Chen, J., Geng, Y., Zhang, W., Fang, Y., … Chen, H. (2023). DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023 (Vol. 37, pp. 405–413). AAAI Press. https://doi.org/10.1609/aaai.v37i1.25114

DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning

Abstract

Cite

Register to see more suggestions