Human action recognition using a multi-modal hybrid deep learning model

8Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

Human action recognition is a challenging problem, especially in the presence of multiple actors and/or multiple scene views. In this paper, multi-modal integration and a hybrid deep learning architecture are deployed in a unified action recognition model. The model incorporates two main types of modalities: 3D skeletons and images, which together capture the two main aspects of an action, which are the body motion and part shape. Instead of a mere fusion of the two types of modalities, the proposed model integrates them by focusing on specific parts of the body, whose locations are known from the 3D skeleton data. The proposed model combines both Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) deep learning architectures into a hybrid one. The model is called MCL, for (M)ulti-Modal (C)NN + (L)STM. MCL consists of two sub-models: CL1D and CL2D that simultaneously extract the spatial and temporal patterns for the two sought input modality types. Their decisions are combined to achieve better accuracy. In order to show the efficiency of the MCL model, its performance is evaluated on the large NTU-RGB+D dataset in two different evaluation scenarios: cross-subject and cross-view. The obtained recognition rates, 74.2 % in cross-subject and 81.4% in cross-view, are superior to the current state of the art results.

Cite

CITATION STYLE

APA

El-Ghaish, H. A., Hussein, M. E., & Shoukry, A. (2017). Human action recognition using a multi-modal hybrid deep learning model. In British Machine Vision Conference 2017, BMVC 2017. BMVA Press. https://doi.org/10.5244/c.31.84

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free