On character vs word embeddings as input for English sentence classification

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

It has become a common practice to use word embeddings, such as those generated by word2vec or GloVe, as inputs for natural language processing tasks. Such embeddings can aid generalisation by capturing statistical regularities in word usage and by capturing some semantic information. However they require the construction of large dictionaries of high-dimensional vectors from very large amounts of text and have limited ability to handle out-of-vocabulary words or spelling mistakes. Some recent work has demonstrated that text classifiers using character-level input can achieve similar performance to those using word embeddings. Where character input replaces word-level input, it can yield smaller, less computationally intensive models, which helps when models need to be deployed on embedded devices. Character input can also help to address out-of-vocabulary words and/or spelling mistakes. It is thus of interest to know whether using character embeddings in place of word embeddings can be done without harming performance. In this paper, we investigate the use of character embeddings vs word embeddings when classifying short texts such as sentences and questions. We find that the models using character embeddings perform just as well as those using word embeddings whilst being much smaller and taking less time to train. Additionally, we demonstrate that using character embeddings makes the models more robust to spelling errors.

Cite

CITATION STYLE

APA

Hammerton, J., Vintró, M., Kapetanakis, S., & Sama, M. (2018). On character vs word embeddings as input for English sentence classification. In Advances in Intelligent Systems and Computing (Vol. 868, pp. 550–566). Springer Verlag. https://doi.org/10.1007/978-3-030-01054-6_40

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free