Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape

82Citations
Citations of this article
95Readers
Mendeley users who have this article in their library.

Abstract

Spelling check is an important preprocessing task when dealing with user generated texts such as tweets and product comments. Compared with some western languages such as English, Chinese spelling check is more complex because there is no word delimiter in Chinese written texts and misspelled characters can only be determined in word level. Our system works as follows. First, we use character-level n-gram language models to detect potential misspelled characters with low probabilities below some predefined threshold. Second, for each potential incorrect character, we generate a candidate set based on pronunciation and shape similarities. Third, we filter some candidate corrections if the candidate cannot form a legal word with its neighbors according to a word dictionary. Finally, we find the best candidate with highest language model probability. If the probability is higher than a predefined threshold, then we replace the original character; or we consider the original character as correct and take no action. Our preliminary experiments shows that our simple method can achieve relatively high precision but low recall.

Cite

CITATION STYLE

APA

Yu, J., & Li, Z. (2014). Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape. In CLP 2014 - 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 220–223). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-6835

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free