We introduce in this paper a hybrid fusion approach allowing the efficient combination of the Kinect modalities within the feature, representation and decision levels. Our contributions are three-fold: (i) We propose an efficient concatenation of complementary per-modality descriptors that rely on the joint modality as a high-level information. (ii) We apply a multi-resolution analysis that combines the local frame-wise decisions with the global BoVW ones. We rely in this context on the scalability of the Fisher vector representation in order to handle large-scale data and apply additional concatenation of its output. (iii) We also propose an efficient score merging scheme by generating multiple weighting-coefficients that combine the strength of different SVM classifiers with a given action label. By evaluating our approach on the Cornell activity dataset, state-of-the-art performances are obtained.
CITATION STYLE
Seddik, B., Gazzah, S., & Amara, N. E. B. (2017). Hybrid multi-modal fusion for human action recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10317 LNCS, pp. 201–209). Springer Verlag. https://doi.org/10.1007/978-3-319-59876-5_23
Mendeley helps you to discover research relevant for your work.