Video summarization via semantic attended networks

Huawei Wei; Bingbing Ni; Yichao Yan; Huanyu Yu; Xiaokang Yang

Conference ProceedingsOPEN ACCESS

Video summarization via semantic attended networks

32nd AAAI Conference on Artificial Intelligence, AAAI 2018 (2018) 216-223

DOI: 10.1609/aaai.v32i1.11297

116Citations

70Readers

Abstract

The goal of video summarization is to distill a raw video into a more compact form without losing much semantic information. However, previous methods mainly consider the diversity and representation interestingness of the obtained summary, and they seldom pay sufficient attention to semantic information of resulting frame set, especially the long temporal range semantics. To explicitly address this issue, we propose a novel technique which is able to extract the most semantically relevant video segments (i.e., valid for a long term temporal duration) and assemble them into an informative summary. To this end, we develop a semantic attended video summarization network (SASUM) which consists of a frame selector and video descriptor to select an appropriate number of video shots by minimizing the distance between the generated description sentence of the summarized video and the human annotated text of the original video. Extensive experiments show that our method achieves a superior performance gain over previous methods on two benchmark datasets.

Cite

CITATION STYLE

APA

Wei, H., Ni, B., Yan, Y., Yu, H., & Yang, X. (2018). Video summarization via semantic attended networks. In 32nd AAAI Conference on Artificial Intelligence, AAAI 2018 (pp. 216–223). AAAI press. https://doi.org/10.1609/aaai.v32i1.11297

Video summarization via semantic attended networks

Abstract

Cite

Register to see more suggestions