First-person video summarization has emerged as an important problem in the areas of computer vision and multimedia communities. In this paper, we present a graph-theoretic framework for summarizing first-person (egocentric) videos at frame level. We first develop a new way of characterizing egocentric video frames by building a center-surround model based on spectral measures of dissimilarity between two graphs representing the center and the surrounding regions in a frame. The frames in a video are next represented by a weighted graph (video similarity graph) in the feature space constituting center-surround differences in entropy and optic flow values along with PHOG (Pyramidal HOG) features. The frames are finally clustered using a MST based approach with a new measure of inadmissibility for edges based on neighbourhood analysis. Frames closest to the centroid of each cluster are used to build the summary. Experimental comparisons on two standard datasets clearly indicate the advantage of our solution.
CITATION STYLE
Sahu, A., & Chowdhury, A. S. (2019). A Graph-Theoretic Framework for Summarizing First-Person Videos. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11510 LNCS, pp. 183–193). Springer Verlag. https://doi.org/10.1007/978-3-030-20081-7_18
Mendeley helps you to discover research relevant for your work.