Synthesizing Counterfactual Samples for Effective Image-Text Matching

Hao Wei; Shuhui Wang; Xinzhe Han; Zhe Xue; Bin Ma; Xiaoming Wei; Xiaolin Wei

Conference ProceedingsOPEN ACCESS

Synthesizing Counterfactual Samples for Effective Image-Text Matching

MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia (2022) 4355-4364

DOI: 10.1145/3503161.3547814

10Citations

7Readers

Abstract

Image-text matching is a fundamental research topic bridging vision and language. Recent works use hard negative mining to capture the multiple correspondences between visual and textual domains. Unfortunately, the truly informative negative samples are quite sparse in the training data, which are hard to obtain only in a randomly sampled mini-batch. Motivated by causal inference, we aim to overcome this shortcoming by carefully analyzing the analogy between hard negative mining and causal effects optimizing. Further, we propose Counterfactual Matching (CFM) framework for more effective image-text correspondence mining. CFM contains three major components, \ie, Gradient-Guided Feature Selection for automatic casual factor identification, Self-Exploration for causal factor completeness, and Self-Adjustment for counterfactual sample synthesis. Compared with traditional hard negative mining, our method largely alleviates the over-fitting phenomenon and effectively captures the fine-grained correlations between image and text modality. We evaluate our CFM in combination with three state-of-the-art image-text matching architectures. Quantitative and qualitative experiments conducted on two publicly available datasets demonstrate its strong generality and effectiveness. Code is available at: https://github.com/weihao20/cfm.

Author supplied keywords

Cite

CITATION STYLE

APA

Wei, H., Wang, S., Han, X., Xue, Z., Ma, B., Wei, X., & Wei, X. (2022). Synthesizing Counterfactual Samples for Effective Image-Text Matching. In MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia (pp. 4355–4364). Association for Computing Machinery, Inc. https://doi.org/10.1145/3503161.3547814

Synthesizing Counterfactual Samples for Effective Image-Text Matching

Abstract

Author supplied keywords

Cite

Register to see more suggestions