Combining audio and image processing for understanding video content has several benefits when compared to using each modality on their own. For the task of context and activity recognition in video sequences, it is important to explore both data streams to gather relevant information. In this paper we describe a video context and activity recognition model. Our work extracts a range of audio and visual features, followed by feature reduction and information fusion. We show that combining audio with video based decision making improves the quality of context and activity recognition in videos by 4% over audio data and 18% over image data. © Springer-Verlag Berlin Heidelberg 2006.
CITATION STYLE
Lopes, J., & Singh, S. (2006). Audio and video feature fusion for activity recognition in unconstrained videos. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4224 LNCS, pp. 823–831). Springer Verlag. https://doi.org/10.1007/11875581_99
Mendeley helps you to discover research relevant for your work.