Key frame extraction for text based video retrieval using Maximally Stable Extremal Regions

12Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.

Abstract

This paper presents a new approach for text-based video content retrieval system. The proposed scheme consists of three main processes that are key frame extraction, text localization and keyword matching. For the key-frame extraction, we proposed a Maximally Stable Extremal Region (MSER) based feature which is oriented to segment shots of the video with different text contents. In text localization process, in order to form the text lines, the MSERs in each key frame are clustered based on their similarity in position, size, color, and stroke width. Then, Tesseract OCR engine is used for recognizing the text regions. In this work, to improve the recognition results, we input four images obtained from different pre-processing methods to Tesseract engine. Finally, the target keyword for querying is matched with OCR results based on an approximate string search scheme. The experiment shows that, by using the MSER feature, the videos can be segmented by using efficient number of shots and provide the better precision and recall in comparison with a sum of absolute difference and edge based method.

Cite

CITATION STYLE

APA

Wattanarachothai, W., & Patanukhom, K. (2015). Key frame extraction for text based video retrieval using Maximally Stable Extremal Regions. In Proceedings of the 2015 1st International Conference on Industrial Networks and Intelligent Systems, INISCom 2015 (pp. 29–37). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.4108/icst.iniscom.2015.258410

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free