A strong and robust baseline for text-image matching

9Citations
Citations of this article
94Readers
Mendeley users who have this article in their library.

Abstract

We review the current schemes of text-image matching models and propose improvements for both training and inference. First, we empirically show limitations of two popular loss (sum and max-margin loss) widely used in training text-image embeddings and propose a trade-off: a kNN-margin loss which 1) utilizes information from hard negatives and 2) is robust to noise as all K-most hardest samples are taken into account, tolerating pseudo negatives and outliers. Second, we advocate the use of Inverted Softmax (IS) and Crossmodal Local Scaling (CSLS) during inference to mitigate the so-called hubness problem in high-dimensional embedding space, enhancing scores of all metrics by a large margin.

Cite

CITATION STYLE

APA

Liu, F., & Ye, R. (2019). A strong and robust baseline for text-image matching. In ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop (pp. 169–176). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p19-2023

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free