Multimodal summarization of user-generated videos

12Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

The exponential growth of user-generated content has increased the need for efficient video summarization schemes. However, most approaches underestimate the power of aural features, while they are designed to work mainly on commercial/professional videos. In this work, we present an approach that uses both aural and visual features in order to create video summaries from user-generated videos. Our approach produces dynamic video summaries, that is, comprising the most “important” parts of the original video, which are arranged so as to preserve their temporal order. We use supervised knowledge from both the aforementioned modalities and train a binary classifier, which learns to recognize the important parts of videos. Moreover, we present a novel user-generated dataset which contains videos from several categories. Every 1 sec part of each video from our dataset has been annotated by more than three annotators as being important or not. We evaluate our approach using several classification strategies based on audio, video and fused features. Our experimental results illustrate the potential of our approach.

Cite

CITATION STYLE

APA

Psallidas, T., Koromilas, P., Giannakopoulos, T., & Spyrou, E. (2021). Multimodal summarization of user-generated videos. Applied Sciences (Switzerland), 11(11). https://doi.org/10.3390/app11115260

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free