Video summarisation greatly improves the efficiency of people browsing videos and saves storage space. A good video summary should satisfy human visual interestingness and preserve the theme of the original video at the semantic level. Unlike many existing methods that consider only visual features to generate video summaries, this study proposes a method that combines visual and semantic cues to extract important information for dynamic video summarisation. The authors propose visual-verbal saliency consistency to add semantic information and propose a novel attention motion, along with other visual features to fully represent visual interestingness. Based on the importance score of each frame calculated by combining these features, they select an optimal subset of segments to generate an important and interesting summary. They evaluate their method using the SumMe and TVSum datasets and experimental results show that their method generates high-quality video summaries.
CITATION STYLE
Xu, B., Liang, H., & Liang, R. (2020). Video summarisation with visual and semantic cues. IET Image Processing, 14(13), 3021–3027. https://doi.org/10.1049/iet-ipr.2019.1355
Mendeley helps you to discover research relevant for your work.