Comparing the quality and speed of sentence classification with modern language models

13Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

Abstract

After the advent of Glove and Word2vec, the dynamic development of language models (LMs) used to generate word embeddings has enabled the creation of better text classifier frameworks. With the vector representations of words generated by newer LMs, embeddings are no longer static but are context-aware. However, the quality of results provided by state-of-the-art LMs comes at the price of speed. Our goal was to present a benchmark to provide insight into the speed-quality trade-off of a sentence classifier framework based on word embeddings provided by selected LMs. We used a recurrent neural network with gated recurrent units to create sentence-level vector representations from word embeddings provided by an LMand a single fully connected layer for classification. Benchmarking was performed on two sentence classification data sets: The Sixth Text REtrieval Conference (TREC6)set and a 1000-sentence data set of our design. OurMonte Carlo cross-validated results based on these two data sources demonstrated that the newest deep learning LMs provided improvements over Glove and FastText in terms of weightedMatthews correlation coefficient (MCC) scores. We postulate that progress in LMs is more apparent when more difficult classification tasks are addressed.

Cite

CITATION STYLE

APA

Fiok, K., Karwowski, W., Gutierrez, E., & Reza-Davahli, M. (2020). Comparing the quality and speed of sentence classification with modern language models. Applied Sciences (Switzerland), 10(10). https://doi.org/10.3390/APP10103386

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free