Learning to segment a video to clips based on scene and camera motion

Adarsh Kowdle; Tsuhan Chen

Conference ProceedingsOPEN ACCESS

Learning to segment a video to clips based on scene and camera motion

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7574 LNCS(PART 3) 272-286

DOI: 10.1007/978-3-642-33712-3_20

11Citations

26Readers

Abstract

In this paper, we present a novel learning-based algorithm for temporal segmentation of a video into clips based on both camera and scene motion, in particular, based on combinations of static vs. dynamic camera and static vs. dynamic scene. Given a video, we first perform shot boundary detection to segment the video to shots. We enforce temporal continuity by constructing a Markov Random Field (MRF) over the frames of each video shot with edges between consecutive frames and cast the segmentation problem as a frame level discrete labeling problem. Using manually labeled data we learn classifiers exploiting cues from optical flow to provide evidence for the different labels, and infer the best labeling over the frames. We show the effectiveness of the approach using user videos and full-length movies. Using sixty full-length movies spanning 50 years, we show that the proposed algorithm of grouping frames purely based on motion cues can aid computational applications such as recovering depth from a video and also reveal interesting trends in movies, which finds itself interesting novel applications in video analysis (time-stamping archive movies) and film studies. © 2012 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Kowdle, A., & Chen, T. (2012). Learning to segment a video to clips based on scene and camera motion. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7574 LNCS, pp. 272–286). https://doi.org/10.1007/978-3-642-33712-3_20

Learning to segment a video to clips based on scene and camera motion

Abstract

Author supplied keywords

Cite

Register to see more suggestions