Active Learning for Video Description with Cluster-Regularized Ensemble Ranking

0Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Automatic video captioning aims to train models to generate text descriptions for all segments in a video, however, the most effective approaches require large amounts of manual annotation which is slow and expensive. Active learning is a promising way to efficiently build a training set for video captioning tasks while reducing the need to manually label uninformative examples. In this work we both explore various active learning approaches for automatic video captioning and show that a cluster-regularized ensemble strategy provides the best active learning approach to efficiently gather training sets for video captioning. We evaluate our approaches on the MSR-VTT and LSMDC datasets using both transformer and LSTM based captioning models and show that our novel strategy can achieve high performance while using up to 60% fewer training data than the strong state of the art baselines.

Author supplied keywords

Cite

CITATION STYLE

APA

Chan, D. M., Vijayanarasimhan, S., Ross, D. A., & Canny, J. F. (2021). Active Learning for Video Description with Cluster-Regularized Ensemble Ranking. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12626 LNCS, pp. 443–459). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-69541-5_27

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free