Human action classification is an important task in computer vision. The Bag-of-Words model is a representation method very used in action classification techniques. In this work we propose an approach based on mid-level features representation for human action description. First, an optimal vocabulary is created without a preliminary number of visual words, which is a known problem of the K-means method. We introduce a graph-based video representation using the interest points relationships, in order to take into account the spatial and temporal layout. Finally, a second visual vocabulary based on n-grams is used for classification. This combines the representational power of graphs with the efficiency of the bag-of-words representation. The representation method was tested on the KTH dataset using STIP and MoSIFT descriptors and multi-class SVM with a chi-square kernel. The experimental results show that our approach using STIP descriptor outperforms the best results of state-of-art, meanwhile using MoSIFT descriptor are comparable to them.
CITATION STYLE
Hernández-García, R., García-Reyes, E., Ramos-Cózar, J., & Guil, N. (2014). Human action classification using n-grams visual vocabulary. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8827, pp. 319–326). Springer Verlag. https://doi.org/10.1007/978-3-319-12568-8_39
Mendeley helps you to discover research relevant for your work.