Recording and sharing of educational or lecture videos has increased in recent years. Within these recordings, we find a large number of math-oriented lectures and tutorials which attract students of all levels. Many of the topics covered by these recordings are better explained using handwritten content on whiteboards or chalkboards. Hence, we find large numbers of lecture videos that feature the instructor writing on a surface. In this work, we propose a novel method for extraction and summarization of the handwritten content found in such videos. Our method is based on a fully convolutional network, FCN-LectureNet, which can extract the handwritten content from the video as binary images. These are further analyzed to identify the unique and stable units of content to produce a spatial-temporal index of handwritten content. A signal which approximates content deletion events is then built using information from the spatial-temporal index. The peaks of this signal are used to create temporal segments of the lecture based on the notion that sub-topics change when large portions of content are deleted. Finally, we use these segments to create an extractive summary of the handwritten content based on key-frames. This will facilitate content-based search and retrieval of these lecture videos. In this work, we also extend the AccessMath dataset to create a novel dataset for benchmarking of lecture video summarization called LectureMath. Our experiments on both datasets show that our novel method can outperform existing methods especially on the larger and more challenging dataset. Our code and data are publicly available.
CITATION STYLE
Davila, K., Xu, F., Setlur, S., & Govindaraju, V. (2021). FCN-LectureNet: Extractive Summarization of Whiteboard and Chalkboard Lecture Videos. IEEE Access, 9, 104469–104484. https://doi.org/10.1109/ACCESS.2021.3099427
Mendeley helps you to discover research relevant for your work.