Video concept detection by audio-visual grouplets

Wei Jiang; Alexander C. Loui

Journal ArticleOPEN ACCESS

Video concept detection by audio-visual grouplets

International Journal of Multimedia Information Retrieval (2012) 1(4) 223-238

DOI: 10.1007/s13735-012-0020-6

7Citations

10Readers

Abstract

We investigate general concept classification in unconstrained videos by joint audio-visual analysis. An audio-visual grouplet (AVG) representation is proposed based on analyzing the statistical temporal audio-visual interactions. Each AVG contains a set of audio and visual codewords that are grouped together according to their strong temporal correlations in videos, and the AVG carries unique audio-visual cues to represent the video content. By using the entire AVGs as building elements, video concepts can be more robustly classified than using traditional vocabularies with discrete audio or visual codewords. Specifically, we conduct coarse-level foreground/background separation in both audio and visual channels, and discover four types of AVGs by exploring mixed-and-matched temporal audio-visual correlations among the following factors: visual foreground, visual background, audio foreground, and audio background. All of these types of AVGs provide discriminative audio-visual patterns for classifying various semantic concepts. To effectively use the AVGs for improved concept classification, a distance metric learning algorithm is further developed. Based on the AVG structure, the algorithm uses an iterative quadratic programming formulation to learn the optimal distances between data points according to the large-margin nearest-neighbor setting. Various types of grouplet-based distances can be computed using individual AVGs, and through our distance metric learning algorithm these grouplet-based distances can be aggregated for final classification. We extensively evaluate our method over the large-scale Columbia consumer video set. Experiments demonstrate that the AVG-based audio-visual representation can achieve consistent and significant performance improvements compared wth other state-of-the-art approaches.

Author supplied keywords

Cite

CITATION STYLE

APA

Jiang, W., & Loui, A. C. (2012). Video concept detection by audio-visual grouplets. International Journal of Multimedia Information Retrieval, 1(4), 223–238. https://doi.org/10.1007/s13735-012-0020-6

Video concept detection by audio-visual grouplets

Abstract

Author supplied keywords

Cite

Register to see more suggestions