Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization

58Citations
Citations of this article
38Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We address the challenging task of event localization, which requires the machine to localize an event and recognize its category in unconstrained videos. Most existing methods leverage only the visual information of a video while neglecting its audio information, which, however, can be very helpful and important for event localization. For example, humans often recognize an event by reasoning with the visual and audio content simultaneously. Moreover, the audio information can guide the model to pay more attention on the informative regions of visual scenes, which can help to reduce the interference brought by the background. Motivated by these, in this paper, we propose a relation-aware network to leverage both audio and visual information for accurate event localization. Specifically, to reduce the interference brought by the background, we propose an audio-guided spatial-channel attention module to guide the model to focus on event-relevant visual regions. Besides, we propose to build connections between visual and audio modalities with a relation-aware module. In particular, we learn the representations of video and/or audio segments by aggregating information from the other modality according to the cross-modal relations. Last, relying on the relation-aware representations, we conduct event localization by predicting the event relevant score and classification score. Extensive experimental results demonstrate that our method significantly outperforms the state-of-the-arts in both supervised and weakly-supervised AVE settings.

Cite

CITATION STYLE

APA

Xu, H., Zeng, R., Wu, Q., Tan, M., & Gan, C. (2020). Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization. In MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia (pp. 3893–3901). Association for Computing Machinery, Inc. https://doi.org/10.1145/3394171.3413581

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free