With the breakthrough of deep learning, lip reading technologies are under extraordinarily rapid progress. It is well-known that Chinese is the most widely spoken language in the world. Unlike alphabetic languages, it involves more than 1,000 pronunciations as Pinyin, and nearly 90,000 pictographic characters as Hanzi, which makes lip reading of Chinese very challenging. In this paper, we implement visual-only Chinese lip reading of unconstrained sentences in a two-step end-to-end architecture (LipCH-Net), in which two deep neural network models are employed to perform the recognition of Picture-to-Pinyin (mouth motion pictures to pronunciations) and the recognition of Pinyin-to-Hanzi (pronunciations to texts) respectively, before having a jointly optimization to improve the overall performance. In addition, two modules in the Pinyin-to-Hanzi model are pre-trained separately with large auxiliary data in advance of sequence-to-sequence training to make the best of long sequence matches for avoiding ambiguity. We collect 6-month daily news broadcasts from China Central Television (CCTV) website, and semi-automatically label them into a 20.95 GB dataset with 20,495 natural Chinese sentences. When trained on the CCTV dataset, the LipCH-Net model outperforms the performance of all state-of-the-art lip reading frameworks. According to the results, our scheme not only accelerates training and reduces overfitting, but also overcomes syntactic ambiguity of Chinese which provides a baseline for future relevant work.
CITATION STYLE
Zhang, X., Gong, H., Dai, X., Yang, F., Liu, N., & Liu, M. (2019). Understanding pictograph with facial features: End-to-End sentence-level lip reading of Chinese. In 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019 (pp. 9211–9218). AAAI Press. https://doi.org/10.1609/aaai.v33i01.33019211
Mendeley helps you to discover research relevant for your work.