Cross-Modal Omni Interaction Modeling for Phrase Grounding

7Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Phrase grounding aims to localize the objects described by phrases in a natural language specification. Previous works model the interaction of inputs from text modality and visual modality only in the intra-modal global level and consequently lacks the ability to capture the precise and complete context information. In this paper, we propose a novel Cross-Modal Omni Interaction network (COI Net) composed of a neighboring interaction module, a global interaction module, a cross-modal interaction module and a multilevel alignment module. Our approach formulates the complex spatial and semantic relationship among image regions and phrases through multi-level multi-modal interaction. We capture the local relationship using the interaction among neighboring regions and then collect the global context through the interaction among all regions using a transformer encoder. We further use a co-attention module to apply the interaction between two modalities to gather the cross-modal context for all image regions and phrases. In addition to the omni interaction modeling, we also leverage a straightforward yet effective multilevel alignment regularization to formulate the dependencies among all grounding decisions. We extensively validate the effectiveness of our model. Experiments show that our approach outperforms existing state-of-the-art methods by large margins on two popular datasets in terms of accuracy: 6.15% on Flickr30K Entities (71.36% increased to 77.51%) and 21.25% on ReferItGame (44.91% increased to 66.16%). The code of our implementation is available at https://github.com/yiranyyu/Phrase-Grounding.

Cite

CITATION STYLE

APA

Yu, T., Hui, T., Yu, Z., Liao, Y., Yu, S., Zhang, F., & Liu, S. (2020). Cross-Modal Omni Interaction Modeling for Phrase Grounding. In MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia (pp. 1725–1734). Association for Computing Machinery, Inc. https://doi.org/10.1145/3394171.3413846

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free