Neural multi-task text normalization and sanitization with pointer-generator

8Citations
Citations of this article
63Readers
Mendeley users who have this article in their library.

Abstract

Text normalization and sanitization are intrinsic components of Natural Language Inferences. In Information Retrieval or Dialogue Generation, normalization of user queries or utterances enhances linguistic understanding by translating non-canonical text to its canonical form, on which many state-of-the-art language models are trained. On the other hand, text sanitization removes sensitive information to guarantee user privacy and anonymity. Existing approaches to normalization and sanitization mainly rely on hand-crafted heuristics and syntactic features of individual tokens while disregarding the linguistic context. Moreover, such context-unaware solutions cannot dynamically determine whether out-of-vocab tokens are misspelt or are entity names. In this work, we formulate text normalization and sanitization as a multi-task text generation approach and propose a neural pointer-generator network based on multihead attention. Its generator effectively captures linguistic context during normalization and sanitization while its pointer dynamically preserves the entities that are generally missing in the vocabulary. Experiments show that our generation approach outperforms both token-based text normalization and sanitization, while the pointer-generator improves the generator-only baseline in terms of BLEU4 score, and classical attentional pointer networks in terms of pointing accuracy.

Cite

CITATION STYLE

APA

Nguyen, V. H., & Sandro, C. (2020). Neural multi-task text normalization and sanitization with pointer-generator. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 37–47). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.nli-1.5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free