This paper proposes three novel feature extraction approaches for human activity recognition in videos. The proposed solutions are based on video coding concepts including motion compensations and coding based feature variables. We use these features with deep learning for model generation and classification, hence the ViCo-MoCo-DL abbreviation which stands for Video Coding and Motion Compensation with Deep Learning. These solutions are fused in terms of averaging their classification scores to predict the human activity in videos. In all proposed solutions, an input video is temporarily segmented into 12 non-overlapping segments of equal size. In the first and second solution each segment is converted into one component of an RGB image, thus resulting in 4 RGB images. The conversion happens in terms of motion capture using motion estimate, motion compensation and accumulating image prediction errors. Consequently, in the first solution, the 4 generated RGB images are tiled into one big image which is used to train a Convolutional Neural Network (CNN) network. In the second solution each generated RGB image is entered into a pre-trained CNN for feature extraction. The resultant FVs are arranged into a matrix and used for training a Long Short-Term Memory network (LSTM). In the third solution, a customized High Efficiency Video Coder (HEVC) is used to generate feature variables per frame. The resultant Feature Vectors (FVs) of 3 video segments are arranged into a matrix and numerically summarized into one FV, thus, each input video is represented by 4 FVs which are used to train another LSTM network. Experimental results on three well-known datasets show the superior classification results of the proposed fused solution over existing work.
Mendeley helps you to discover research relevant for your work.
CITATION STYLE
Shanableh, T. (2023). ViCo-MoCo-DL: Video Coding and Motion Compensation Solutions for Human Activity Recognition Using Deep Learning. IEEE Access, 11, 73971–73981. https://doi.org/10.1109/ACCESS.2023.3296252