Finding an effective way to represent human actions is yet an open problem because it usually requires taking evidences extracted from various temporal resolutions into account. A conventional way of representing an action employs temporally ordered fine-grained movements, e.g., key poses or subtle motions. Many existing approaches model actions by directly learning the transitional relationships between those fine-grained features. Yet, an action data may have many similar observations with occasional and irregular changes, which make commonly used fine-grained features less reliable. This paper presents a set of temporal pyramid features that enriches action representation with various levels of semantic granularities. For learning and inferring the proposed pyramid features, we adopt a discriminative model with latent variables to capture the hidden dynamics in each layer of the pyramid. Our method is evaluated on a Tai-Chi Chun dataset and a daily activities dataset. Both of them are collected by us. Experimental results demonstrate that our approach achieves more favorable performance than existing methods.
Lin, S. Y., Lin, Y. Y., Chen, C. S., & Hung, Y. P. (2017). Learning and inferring human actions with temporal pyramid features based on conditional random fields. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (pp. 2617–2621). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ICASSP.2017.7952630