Text induced spelling correction

Martin Reynaert

Conference ProceedingsOPEN ACCESS

Text induced spelling correction

Reynaert M

COLING 2004 - Proceedings of the 20th International Conference on Computational Linguistics (2004)

DOI: 10.3115/1220355.1220475

14Citations

91Readers

Abstract

We present TISC, a language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from a very large corpus of raw text, without supervision, and contains word unigrams and word bigrams. It is stored in a novel representation based on a purpose-built hashing function, which provides a fast and computationally tractable way of checking whether a particular word form likely constitutes a spelling error and of retrieving correction candidates. The system employs input context and lexicon evidence to automatically propose a limited number of ranked correction candidates when insufficient information for an unambiguous decision on a single correction is available. We describe the implemented prototype and evaluate it on English and Dutch text, containing real-world errors in more or less limited contexts. The results are compared with those of the isolated word spelling checking programs ISPELL and the MICROSOFT PROOFING TOOLS (MPT).

Cite

CITATION STYLE

APA

Reynaert, M. (2004). Text induced spelling correction. In COLING 2004 - Proceedings of the 20th International Conference on Computational Linguistics. Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220355.1220475

Text induced spelling correction

Abstract

Cite

Register to see more suggestions