Extracting salient keywords from instructional videos using joint text, audio and visual cues

Youngja Park; Ying Li

Conference ProceedingsOPEN ACCESS

Extracting salient keywords from instructional videos using joint text, audio and visual cues

HLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Short Papers (2006) 109-112

DOI: 10.3115/1614049.1614077

4Citations

76Readers

Abstract

This paper presents a multi-modal feature-based system for extracting salient keywords from transcripts of instructional videos. Specifically, we propose to extract domain-specific keywords for videos by integrating various cues from linguistic and statistical knowledge, as well as derived sound classes and characteristic visual content types. The acquisition of such salient keywords will facilitate video indexing and browsing, and significantly improve the quality of current video search engines. Experiments on four government instructional videos show that 82% of the salient keywords appear in the top 50% of the highly ranked keywords. In addition, the audiovisual cues improve precision and recall by 1.1% and 1.5% respectively.

Cite

CITATION STYLE

APA

Park, Y., & Li, Y. (2006). Extracting salient keywords from instructional videos using joint text, audio and visual cues. In HLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Short Papers (pp. 109–112). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1614049.1614077

Extracting salient keywords from instructional videos using joint text, audio and visual cues

Abstract

Cite

Register to see more suggestions