Egocentric videos are usually of long duration and contains lot of redundancy which makes summarization an essential task for such videos. In this work we are targeting object triggered egocentric video summarization which aims at extracting all the occurrences of an object in a given video, in near real time. We propose a modular pipeline which first aims at limiting the redundant information and then uses a Convolutional Neural Network and LSTM based approach for object detection. Following this we represent the video as a dictionary which captures the semantic information in the video. Matching a query object reduces to doing an And-Or Tree traversal followed by deepmatching algorithm for fine grained matching. The frames containing the object, which would have been missed at the pruning stage are retrieved by running a tracker on the frames selected by the pipeline mentioned. The modular pipeline allows replacing any module with its more efficient version. Performance tests ran on the overall pipeline for egocentric datasets, EDUB dataset and personal recorded videos, give an average recall of 0.76.
CITATION STYLE
Jain, S., Rameshan, R. M., & Nigam, A. (2017). Object triggered egocentric video summarization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10425 LNCS, pp. 428–439). Springer Verlag. https://doi.org/10.1007/978-3-319-64698-5_36
Mendeley helps you to discover research relevant for your work.