Convolutional hierarchical attention network for query-focused video summarization

54Citations
Citations of this article
55Readers
Mendeley users who have this article in their library.

Abstract

Previous approaches for video summarization mainly concentrate on finding the most diverse and representative visual contents as video summary without considering the user’s preference. This paper addresses the task of query-focused video summarization, which takes user’s query and a long video as inputs and aims to generate a query-focused video summary. In this paper, we consider the task as a problem of computing similarity between video shots and query. To this end, we propose a method, named Convolutional Hierarchical Attention Network (CHAN), which consists of two parts: feature encoding network and query-relevance computing module. In the encoding network, we employ a convolutional network with local self-attention mechanism and query-aware global attention mechanism to learns visual information of each shot. The encoded features will be sent to query-relevance computing module to generate query-focused video summary. Extensive experiments on the benchmark dataset demonstrate the competitive performance and show the effectiveness of our approach.

Cite

CITATION STYLE

APA

Xiao, S., Zhao, Z., Zhang, Z., Yan, X., & Yang, M. (2020). Convolutional hierarchical attention network for query-focused video summarization. In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence (pp. 12426–12433). AAAI press. https://doi.org/10.1609/aaai.v34i07.6929

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free