Val: Visual-attention action localizer

17Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we focus on solving a new task called TALL (Temporal Activity Localization via Language Query). The goal of it is to use nature language queries to localize actions in longer, untrimmed videos. We propose a new model called VAL (Visual-attention Action Localizer) to address it. Specifically, it employs voxel-wise attention and channel-wise attention on last conv-layer feature maps. These two visual attention are designed corresponding to the characteristics of feature maps. They can enhance the visual representations and boost the cross-modal correlation extraction process. Experimental results on TaCoS and Charades-STA datasets both show the effectiveness of our model.

Cite

CITATION STYLE

APA

Song, X., & Han, Y. (2018). Val: Visual-attention action localizer. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11165 LNCS, pp. 340–350). Springer Verlag. https://doi.org/10.1007/978-3-030-00767-6_32

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free