Visual and Phonological Feature Enhanced Siamese BERT for Chinese Spelling Error Correction

1Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

Chinese Spelling Check (CSC) aims to detect and correct spelling errors in Chinese. Most CSC models rely on human-defined confusion sets to narrow the search space, failing to resolve errors outside the confusion set. However, most spelling errors in current benchmark datasets are character pairs in similar pronunciations. Errors in similar shapes and errors which are visually and phonologically irrelevant are not considered. Furthermore, widely-used automatically generated training data in CSC tasks leads to label leakage and unfair comparison between different methods. In this work, we propose a feature (visual and phonological) enhanced siamese BERT to (1) correct spelling errors without using confusion sets; (2) integrate phonological and visual features for CSC by a glyph graph; (3) improve performance for unseen spelling errors. To evaluate CSC methods fairly and comprehensively, we build a large-scale CSC dataset in which the number of samples in different error types is the same. The experimental results show that the proposed approach achieves better performance compared with previous state-of-the-art methods on three benchmark datasets and the new error-type balanced dataset.

Cite

CITATION STYLE

APA

Liu, Y., Guo, H., Wang, S., & Wang, T. (2022). Visual and Phonological Feature Enhanced Siamese BERT for Chinese Spelling Error Correction. Applied Sciences (Switzerland), 12(9). https://doi.org/10.3390/app12094578

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free