Hierarchical Vision-Language Alignment for Video Captioning

22Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We have witnessed promising advances on video captioning in recent years, which is a challenging task since it is hard to capture the semantic correspondences between visual content and language descriptions. Different granularities of language components (e.g. words, phrases and sentences), are corresponding to different granularities of visual elements (e.g. objects, visual relations and interested regions). These correspondences can provide multi-level alignments and complementary information for transforming visual content to language descriptions. Therefore, we propose an Attention Guided Hierarchical Alignment (AGHA) approach for video captioning. In the proposed approach, hierarchical vision-language alignments, including object-word, relation-phrase, and region-sentence alignments, are extracted from a well-learned model that suits for multiple tasks related to vision and language, which are then embedded into parallel encoder-decoder streams to provide multi-level semantic guidance and rich complementarities on description generation. Besides, multi-granularity visual features are also exploited to obtain the coarse-to-fine understanding on complex video content, where an attention mechanism is applied to extract comprehensive visual discrimination to enhance video captioning. Experimental results on widely-used dataset MSVD demonstrate that AGHA achieves promising improvement on popular evaluation metrics.

Cite

CITATION STYLE

APA

Zhang, J., & Peng, Y. (2019). Hierarchical Vision-Language Alignment for Video Captioning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11295 LNCS, pp. 42–54). Springer Verlag. https://doi.org/10.1007/978-3-030-05710-7_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free