Aiming at the situation that complex natural scene text is difficult to recognize a scene text recognition method based on an encoder-decoder framework is proposed. The method converts the natural text recognition into a sequence mark by combining the connection time classification (CTC) and attention mechanism under the encoder-decoder framework, in order to overcome the problem of character segmentation, using the correlation between image and text sequence. First of all, a convolutional neural network (CNN) is used to generate an ordered feature sequence from the entire word image. Then, the generated feature sequence is feature-coded using the bidirectional long short-term memory (Bi-LSTM) network. Finally, an integrated module of the CTC and attention mechanism is designed to decode and output the text sequence. The experiments show that compared with the comparison method, the recognition accuracy of the method is improved obviously.
CITATION STYLE
Zuo, L. Q., Sun, H. M., Mao, Q. C., Qi, R., & Jia, R. S. (2019). Natural Scene Text Recognition Based on Encoder-Decoder Framework. IEEE Access, 7, 62616–62623. https://doi.org/10.1109/ACCESS.2019.2916616
Mendeley helps you to discover research relevant for your work.