Hybrid Fusion with Intra- And Cross-Modality Attention for Image-Recipe Retrieval

Jiao Li; Xing Xu; Wei Yu; Fumin Shen; Zuo Cao; Kai Zuo; Heng Tao Shen

Conference ProceedingsOPEN ACCESS

Hybrid Fusion with Intra- And Cross-Modality Attention for Image-Recipe Retrieval

Li J
Xu X
Yu W
et al.

SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021) 244-254

DOI: 10.1145/3404835.3462965

18Citations

17Readers

Get full text

Abstract

Image-recipe retrieval, which aims at retrieving the relevant recipe from a food image and vice versa, is now attracting widespread attention, since sharing food-related images and recipes on the Internet has become a popular trend. Existing methods have formulated this problem as a typical cross-modal retrieval task by learning the image-recipe similarity. Though these methods have made inspiring achievements for image-recipe retrieval, they may still be less effective to jointly incorporate the three crucial points: (1) the association between ingredients and instructions, (2) fine-grained image information, and (3) the latent alignment between recipes and images. To this end, we propose a novel framework namedHybrid Fusion with Intra- and Cross-Modality Attention (HF-ICMA) to learn accurate image-recipe similarity. Our HF-ICMA model adopts an intra-recipe fusion module to focus on the interaction between ingredients and instructions within a recipe, and further enriches the expressions of the two separate embeddings. Meanwhile, an image-recipe fusion module is devised to explore the potential relationship between fine-grained image regions and ingredients from the recipe, which jointly forms the final image-recipe similarity from both the local and global aspects. Extensive experiments on the large-scale benchmark dataset Recipe1M show that our model significantly outperforms the state-of-the-art approaches on various image-recipe retrieval scenarios.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, J., Xu, X., Yu, W., Shen, F., Cao, Z., Zuo, K., & Shen, H. T. (2021). Hybrid Fusion with Intra- And Cross-Modality Attention for Image-Recipe Retrieval. In SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 244–254). Association for Computing Machinery, Inc. https://doi.org/10.1145/3404835.3462965

Hybrid Fusion with Intra- And Cross-Modality Attention for Image-Recipe Retrieval

Abstract

Author supplied keywords

Cite

Register to see more suggestions