Val: Visual-attention action localizer

Xiaomeng Song; Yahong Han

Conference Proceedings

Val: Visual-attention action localizer

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11165 LNCS 340-350

DOI: 10.1007/978-3-030-00767-6_32

17Citations

7Readers

Get full text

Abstract

In this paper, we focus on solving a new task called TALL (Temporal Activity Localization via Language Query). The goal of it is to use nature language queries to localize actions in longer, untrimmed videos. We propose a new model called VAL (Visual-attention Action Localizer) to address it. Specifically, it employs voxel-wise attention and channel-wise attention on last conv-layer feature maps. These two visual attention are designed corresponding to the characteristics of feature maps. They can enhance the visual representations and boost the cross-modal correlation extraction process. Experimental results on TaCoS and Charades-STA datasets both show the effectiveness of our model.

Author supplied keywords

Cite

CITATION STYLE

APA

Song, X., & Han, Y. (2018). Val: Visual-attention action localizer. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11165 LNCS, pp. 340–350). Springer Verlag. https://doi.org/10.1007/978-3-030-00767-6_32

Val: Visual-attention action localizer

Abstract

Author supplied keywords

Cite

Register to see more suggestions