Abstract
In online learning systems, measuring the similarity between educational videos and exercises is a fundamental task with great application potentials. In this paper, we explore to measure the fine-grained similarity by leveraging multimodal information. The problem remains pretty much open due to several domain-specific characteristics. First, unlike general videos, educational videos contain not only graphics but also text and formulas, which have a fixed reading order. Both spatial and temporal information embedded in the frames should be modeled. Second, there are semantic associations between adjacent video segments. The semantic associations will affect the similarity and different exercises usually focus on the related context of different ranges. Third, the fine-grained labeled data for training the model is scarce and costly. To tackle the aforementioned challenges, we propose VENet to measure the similarity at both video-level and segment-level by just exploiting the video-level labeled data. Extensive experimental results on real-world data demonstrate the effectiveness of VENet.
Author supplied keywords
Cite
CITATION STYLE
Wang, X., Huang, W., Liu, Q., Yin, Y., Huang, Z., Wu, L., … Wang, X. (2020). Fine-Grained Similarity Measurement between Educational Videos and Exercises. In MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia (pp. 331–339). Association for Computing Machinery, Inc. https://doi.org/10.1145/3394171.3413783
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.