Learning to describe video with weak supervision by exploiting negative sentential information

Haonan Yu; Jeffrey Mark Siskind

Conference ProceedingsOPEN ACCESS

Learning to describe video with weak supervision by exploiting negative sentential information

Proceedings of the National Conference on Artificial Intelligence (2015) 5 3855-3863

DOI: 10.1609/aaai.v29i1.9790

9Citations

19Readers

Abstract

Most previous work on video description trains individual parts of speech independently. It is more appealing from a linguistic point of view, for word models for all parts of speech to be learned simultaneously from whole sentences, a hypothesis suggested by some linguists for child language acquisition. In this paper, we learn to describe video by discriminatively training positive sentential labels against negative ones in a weakly supervised fashion: the meaning representations (i.e., HMMs) of individual words in these labels are learned from whole sentences without any correspondence annotation of what those words denote in the video. Textual descriptions are then generated for new video using trained word models.

Cite

CITATION STYLE

APA

Yu, H., & Siskind, J. M. (2015). Learning to describe video with weak supervision by exploiting negative sentential information. In Proceedings of the National Conference on Artificial Intelligence (Vol. 5, pp. 3855–3863). AI Access Foundation. https://doi.org/10.1609/aaai.v29i1.9790

Learning to describe video with weak supervision by exploiting negative sentential information

Abstract

Cite

Register to see more suggestions