Novel CNN Architecture for Human Action Recognition

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The visual content representation for spatial and motion features is essential to perform many video processing applications including human action recognition. The existing works used 3D CNN architecture in two stream one for spatial and another for motion or RNN or LSTM. In recent days, research has been focused to exploit the capability of 2D CNN for video data representation without RNN or LSTM. Hence this paper proposes a novel CNN architecture to learn the representation of video sequence by giving spatial information along with temporal information as input in a single stream for classifying actions in video. It is achieved by concatenating intensity image (spatial) with optical flow (motion) which is encoded by the proposed network called Actionet. The proposed Actionet was experimented with HMDB51, UCF101 and KTH datasets which outperformed well with mean classification accuracy of 98.82%, 99.96% and 81% respectively. The efficiency of representation has also been evaluated by utilizing the learnt features for training multi-SVM, KNN and ctree which also yielded good results.

Cite

CITATION STYLE

APA

Kalaivani, P., Mohamed Mansoor Roomi, S., Maheesha, M., & Subathraa, V. (2021). Novel CNN Architecture for Human Action Recognition. In Lecture Notes in Electrical Engineering (Vol. 700, pp. 2323–2333). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-15-8221-9_217

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free