In this paper we focus on a new problem of event coreference resolution across television news videos. Based on the observation that the contents from multiple data modalities are complementary, we develop a novel approach to jointly encode effective features from both closed captions and video key frames. Experiment results demonstrate that visual features provided 7.2% absolute F-score gain on stateof-the-art text based event extraction and coreference resolution.
CITATION STYLE
Zhang, T., Li, H., Ji, H., & Chang, S. F. (2015). Cross-document event coreference resolution based on cross-media features. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 201–206). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d15-1020
Mendeley helps you to discover research relevant for your work.