Sympathy for the details: Dense trajectories and hybrid classification architectures for action recognition

30Citations
Citations of this article
43Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Action recognition in videos is a challenging task due to the complexity of the spatio-temporal patterns to model and the difficulty to acquire and learn on large quantities of video data. Deep learning, although a breakthrough for image classification and showing promise for videos, has still not clearly superseded action recognition methods using hand-crafted features, even when training on massive datasets. In this paper, we introduce hybrid video classification architectures based on carefully designed unsupervised representations of hand-crafted spatiotemporal features classified by supervised deep networks. As we show in our experiments on five popular benchmarks for action recognition, our hybrid model combines the best of both worlds: it is data efficient (trained on 150 to 10000 short clips) and yet improves significantly on the state of the art, including recent deep models trained on millions of manually labelled images and videos.

Cite

CITATION STYLE

APA

De Souza, C. R., Gaidon, A., Vig, E., & López, A. (2016). Sympathy for the details: Dense trajectories and hybrid classification architectures for action recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9911 LNCS, pp. 697–716). Springer Verlag. https://doi.org/10.1007/978-3-319-46478-7_43

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free