In this paper, we describe how information obtained from multiple views using a network of cameras can be effectively combined to yield a reliable and fast human activity recognition system. First, we present a score-based fusion technique for combining information from multiple cameras that can handle the arbitrary orientation of the subject with respect to the cameras and that does not rely on a symmetric deployment of the cameras. Second, we describe how longer, variable duration, inter-leaved action sequences can be recognized in real-time based on multi-camera data that is continuously streaming in. Our framework does not depend on any particular feature extraction technique, and as a result, the proposed system can easily be integrated on top of existing implementations for view-specific classifiers and feature descriptors. For implementation and testing of the proposed system, we have used computationally simple locality-specific motion information extracted from the spatio-temporal shape of a human silhouette as our feature descriptor. This lends itself to an efficient distributed implementation, while maintaining a high frame capture rate. We demonstrate the robustness of our algorithms by implementing them on a portable multi-camera, video sensor network testbed and evaluating system performance under different camera network configurations.
CITATION STYLE
Kavi, R., & Kulathumani, V. (2013). Real-time recognition of action sequences using a distributed video sensor network. Journal of Sensor and Actuator Networks, 2(3), 486–508. https://doi.org/10.3390/jsan2030486
Mendeley helps you to discover research relevant for your work.