Unsupervised deep networks for temporal localization of human actions in streaming videos

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We propose a deep neural network which captures latent temporal features suitable for localizing actions temporally in streaming videos. This network uses unsupervised generative models containing autoencoders and conditional restricted Boltzmann machines to model temporal structure present in an action. Human motions are non-linear in nature, and thus require continuous temporal model representation of motion which are crucial for streaming videos. The generative ability would help predict features at future time steps which can give an indication of completion of action at any instant. To accumulate M classes of action, we train an autencoder to seperate out actions spaces, and learn generative models per action space. The final layer accumulates statistics from each model, and estimates action class and percentage of completion in a segment of frames. Experimental results prove that this network provides a good predictive and recognition capability required for action localization in streaming videos.

Cite

CITATION STYLE

APA

Nair, B. M. (2016). Unsupervised deep networks for temporal localization of human actions in streaming videos. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10073 LNCS, pp. 143–155). Springer Verlag. https://doi.org/10.1007/978-3-319-50832-0_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free