Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape

Junjie Yu; Zhenghua Li

Conference ProceedingsOPEN ACCESS

Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape

CLP 2014 - 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (2014) 220-223

DOI: 10.3115/v1/w14-6835

82Citations

95Readers

Abstract

Spelling check is an important preprocessing task when dealing with user generated texts such as tweets and product comments. Compared with some western languages such as English, Chinese spelling check is more complex because there is no word delimiter in Chinese written texts and misspelled characters can only be determined in word level. Our system works as follows. First, we use character-level n-gram language models to detect potential misspelled characters with low probabilities below some predefined threshold. Second, for each potential incorrect character, we generate a candidate set based on pronunciation and shape similarities. Third, we filter some candidate corrections if the candidate cannot form a legal word with its neighbors according to a word dictionary. Finally, we find the best candidate with highest language model probability. If the probability is higher than a predefined threshold, then we replace the original character; or we consider the original character as correct and take no action. Our preliminary experiments shows that our simple method can achieve relatively high precision but low recall.

Cite

CITATION STYLE

APA

Yu, J., & Li, Z. (2014). Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape. In CLP 2014 - 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 220–223). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-6835

Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape

Abstract

Cite

Register to see more suggestions