Comparing the quality and speed of sentence classification with modern language models

Krzysztof Fiok; Waldemar Karwowski; Edgar Gutierrez; Mohammad Reza-Davahli

Journal ArticleOPEN ACCESS

Comparing the quality and speed of sentence classification with modern language models

Applied Sciences (Switzerland) (2020) 10(10)

DOI: 10.3390/APP10103386

13Citations

18Readers

Abstract

After the advent of Glove and Word2vec, the dynamic development of language models (LMs) used to generate word embeddings has enabled the creation of better text classifier frameworks. With the vector representations of words generated by newer LMs, embeddings are no longer static but are context-aware. However, the quality of results provided by state-of-the-art LMs comes at the price of speed. Our goal was to present a benchmark to provide insight into the speed-quality trade-off of a sentence classifier framework based on word embeddings provided by selected LMs. We used a recurrent neural network with gated recurrent units to create sentence-level vector representations from word embeddings provided by an LMand a single fully connected layer for classification. Benchmarking was performed on two sentence classification data sets: The Sixth Text REtrieval Conference (TREC6)set and a 1000-sentence data set of our design. OurMonte Carlo cross-validated results based on these two data sources demonstrated that the newest deep learning LMs provided improvements over Glove and FastText in terms of weightedMatthews correlation coefficient (MCC) scores. We postulate that progress in LMs is more apparent when more difficult classification tasks are addressed.

Author supplied keywords

Cite

CITATION STYLE

APA

Fiok, K., Karwowski, W., Gutierrez, E., & Reza-Davahli, M. (2020). Comparing the quality and speed of sentence classification with modern language models. Applied Sciences (Switzerland), 10(10). https://doi.org/10.3390/APP10103386

Comparing the quality and speed of sentence classification with modern language models

Abstract

Author supplied keywords

Cite

Register to see more suggestions