Coding textual inputs boosts the accuracy of neural networks

2Citations
Citations of this article
73Readers
Mendeley users who have this article in their library.

Abstract

Natural Language Processing (NLP) tasks are usually performed word by word on textual inputs. We can use arbitrary symbols to represent the linguistic meaning of a word and use these symbols as inputs. As “alternatives” to a text representation, we introduce Soundex, MetaPhone, NYSIIS, logogram to NLP, and develop fixed-output-length coding and its extension using Huffman coding. Each of those codings combines different character/digital sequences and constructs a new vocabulary based on codewords. We find that the integration of those codewords with text provides more reliable inputs to Neural-Network-based NLP systems through redundancy than text-alone inputs. Experiments demonstrate that our approach outperforms the state-of-the-art models on the application of machine translation, language modeling, and part-of-speech tagging. The source code is available at https://github.com/abdulrafae/coding nmt.

Cite

CITATION STYLE

APA

Khan, A. R., Xu, J., & Sun, W. (2020). Coding textual inputs boosts the accuracy of neural networks. In EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 1350–1360). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.emnlp-main.104

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free