A Deep Convolutional Deblurring and Detection Neural Network for Localizing Text in Videos

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Scene text in the video is usually vulnerable to various blurs like those caused by camera or text motions, which brings additional difficulty to reliably extract them from the video for content-based video applications. In this paper, we propose a novel fully convolutional deep neural network for deblurring and detecting text in the video. Specifically, to cope with blur of video text, we propose an effective deblurring subnetwork that is composed of multi-level convolutional blocks with both cross-block (long) and within-block (short) skip connections for progressively learning residual deblurred image details as well as a spatial attention mechanism to pay more attention on blurred regions, which generates the sharper image for current frame by fusing multiple surrounding adjacent frames. To further localize text in the frames, we enhance the EAST text detection model by introducing deformable convolution layers and deconvolution layers, which better capture widely varied appearances of video text. Experiments on the public scene text video dataset demonstrate the state-of-the-art performance of the proposed video text deblurring and detection model.

Cite

CITATION STYLE

APA

Wang, Y., Qian, Y., Shi, J., & Su, F. (2020). A Deep Convolutional Deblurring and Detection Neural Network for Localizing Text in Videos. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11962 LNCS, pp. 112–124). Springer. https://doi.org/10.1007/978-3-030-37734-2_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free