Hierarchical Character Tagger for Short Text Spelling Error Correction

4Citations
Citations of this article
54Readers
Mendeley users who have this article in their library.
Get full text

Abstract

State-of-the-art approaches to spelling error correction problem include Transformer-based Seq2Seq models, which require large training sets and suffer from slow inference time; and sequence labeling models based on Transformer encoders like BERT, which involve token-level label space and therefore a large pre-defined vocabulary dictionary. In this paper we present a Hierarchical Character Tagger model, or HCTagger, for short text spelling error correction. We use a pre-trained language model at the character level as a text encoder, and then predict character-level edits to transform the original text into its error-free form with a much smaller label space. For decoding, we propose a hierarchical multi-task approach to alleviate the issue of long-tail label distribution without introducing extra model parameters. Experiments on two public misspelling correction datasets demonstrate that HCTagger is an accurate and much faster approach than many existing models.

Cite

CITATION STYLE

APA

Gao, M., Xu, C., & Shi, P. (2021). Hierarchical Character Tagger for Short Text Spelling Error Correction. In W-NUT 2021 - 7th Workshop on Noisy User-Generated Text, Proceedings of the Conference (pp. 106–113). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.wnut-1.13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free