Spelling check is an important preprocessing task when dealing with user generated texts such as tweets and product comments. Compared with some western languages such as English, Chinese spelling check is more complex because there is no word delimiter in Chinese written texts and misspelled characters can only be determined in word level. Our system works as follows. First, we use character-level n-gram language models to detect potential misspelled characters with low probabilities below some predefined threshold. Second, for each potential incorrect character, we generate a candidate set based on pronunciation and shape similarities. Third, we filter some candidate corrections if the candidate cannot form a legal word with its neighbors according to a word dictionary. Finally, we find the best candidate with highest language model probability. If the probability is higher than a predefined threshold, then we replace the original character; or we consider the original character as correct and take no action. Our preliminary experiments shows that our simple method can achieve relatively high precision but low recall.
CITATION STYLE
Yu, J., & Li, Z. (2014). Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape. In CLP 2014 - 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 220–223). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-6835
Mendeley helps you to discover research relevant for your work.