DCSpell: A Detector-Corrector Framework for Chinese Spelling Error Correction

18Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Spelling Error Correction (SEC) that detects and corrects spelling errors in a text has a wide range of applications in human language understanding. Earlier solutions, including statistic-based methods, one-stage, and two-stage machine learning-based methods, cannot build deeply bidirectional models and significantly confine the learning ability. With the recently emerging masked language models, transformer-based networks have achieved remarkable success in SEC. However, current transformer-based Chinese SEC algorithms are all end-to-end methods, which suffer from high false alarm rates because they correct each character of the sentence regardless of its correctness. This issue becomes even more severe when there exist only a small fraction of incorrect characters in the whole sentence. To solve this problem, we propose a cloze-style detector-corrector framework (DCSpell) that firstly detects whether a character is erroneous before correcting it. Specifically, DCSpell employs the discriminator of ELECTRA as the Detector to detect the positions of incorrect characters. The Detector is trained by a sample-efficient replaced token detection pre-training task, and thus allows domain adaption with a small amount of data. After that, a transformer-based Corrector is used to find the correct character for each detected position. It employs sentence pairs as the input, which potentially incorporates the knowledge of phonological and visual similarity. A confusion-set-based post-processing is used to further improve the performance. Experiments show that DCSpell achieves 15.7% improvement on the SIGHAN dataset and 6.6% improvement on a dataset transcribed from a real-world acoustic speech corpus compared to the state-of-the-art methods in terms of the F1 score.

Cite

CITATION STYLE

APA

Li, J., Wu, G., Yin, D., Wang, H., & Wang, Y. (2021). DCSpell: A Detector-Corrector Framework for Chinese Spelling Error Correction. In SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1870–1874). Association for Computing Machinery, Inc. https://doi.org/10.1145/3404835.3463050

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free