We propose a deep neural network which captures latent temporal features suitable for localizing actions temporally in streaming videos. This network uses unsupervised generative models containing autoencoders and conditional restricted Boltzmann machines to model temporal structure present in an action. Human motions are non-linear in nature, and thus require continuous temporal model representation of motion which are crucial for streaming videos. The generative ability would help predict features at future time steps which can give an indication of completion of action at any instant. To accumulate M classes of action, we train an autencoder to seperate out actions spaces, and learn generative models per action space. The final layer accumulates statistics from each model, and estimates action class and percentage of completion in a segment of frames. Experimental results prove that this network provides a good predictive and recognition capability required for action localization in streaming videos.
CITATION STYLE
Nair, B. M. (2016). Unsupervised deep networks for temporal localization of human actions in streaming videos. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10073 LNCS, pp. 143–155). Springer Verlag. https://doi.org/10.1007/978-3-319-50832-0_15
Mendeley helps you to discover research relevant for your work.