Learning unsupervised video summarization with semantic-consistent network

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The aim of video summarization is to refine a video into concise form without losing its gist. In general, a summary with similar semantics to the original video can represent the original well. Unfortunately, most existing methods focus more on the diversity and representation content of the video, and few of them take video’s semantic into consideration. In addition, most methods related to semantic pursue their own description of the video and this way will learn a biased mode. In order to solve these issues, we propose a novel semantic-consistent unsupervised framework termed ScSUM which is able to extract the essence of the video via obtaining the greatest semantic similarity, and requires no manual description. In particular, ScSUM consists of a frame selector, and a video descriptor which not only predict the description of summary, but also produce the description of original video used as targets. The main goal of our propose is to minimize the distance between the summary and original video in the semantic space. Finally, experiments on two benchmark datasets validate the effectiveness of the proposed methods and demonstrate that our method achieves competitive performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhao, Y., Hu, X., Liu, X., & Fan, C. (2020). Learning unsupervised video summarization with semantic-consistent network. In Communications in Computer and Information Science (Vol. 1265 CCIS, pp. 207–219). Springer. https://doi.org/10.1007/978-981-15-7670-6_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free