Hierarchical late fusion for concept detection in videos

8Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Current research shows that the detection of semantic concepts (e.g., animal, bus, person, dancing, etc.) in multimedia documents such as videos, requires the use of several types of complementary descriptors in order to achieve good results. In this work, we explore strategies for combining dozens of complementary content descriptors (or “experts”) in an efficient way, through the use of late fusion approaches, for concept detection in multimedia documents. We explore two fusion approaches that share a common structure: both start with a clustering of experts stage, continue with an intra-cluster fusion and finish with an inter-cluster fusion, and we also experiment with other state-of-the-art methods. The first fusion approach relies on a priori knowledge about the internals of each expert to group the set of available experts by similarity. The second approach automatically obtains measures on the similarity of experts from their output to group the experts using agglomerative clustering, and then combines the results of this fusion with those from other methods. In the end, we show that an additional performance boost can be obtained by also considering the context of multimedia elements.

Cite

CITATION STYLE

APA

Strat, S. T., Benoit, A., Lambert, P., Bredin, H., & Quénot, G. (2014). Hierarchical late fusion for concept detection in videos. In Advances in Computer Vision and Pattern Recognition (Vol. 64, pp. 53–77). Springer London. https://doi.org/10.1007/978-3-319-05696-8_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free