In this work the use of feature tracks for the detection of high-level features (concepts) in video is proposed. Extending previous work on local interest point detection and description in images, feature tracks are defined as sets of local interest points that are found in different frames of a video shot and exhibit spatio-temporal and visual continuity, thus defining a trajectory in the 2D+Time space. These tracks jointly capture the spatial attributes of 2D local regions and their corresponding long-term motion. The extraction of feature tracks and the selection and representation of an appropriate subset of them allow the generation of a Bag-of-Spatiotemporal-Words model for the shot, which facilitates capturing the dynamics of video content. Experimental evaluation of the proposed approach on two challenging datasets (TRECVID 2007, TRECVID 2010) highlights how the selection, representation and use of such feature tracks enhances the results of traditional keyframe-based concept detection techniques. © 2013 Springer Science+Business Media.
CITATION STYLE
Mezaris, V., Dimou, A., & Kompatsiaris, I. (2013). Local invariant feature tracks for high-level video feature extraction. In Lecture Notes in Electrical Engineering (Vol. 158 LNEE, pp. 165–180). https://doi.org/10.1007/978-1-4614-3831-1_10
Mendeley helps you to discover research relevant for your work.