Neural Networks Compression for Language Modeling

22Citations
Citations of this article
39Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In this paper, we consider several compression techniques for the language modeling problem based on recurrent neural networks (RNNs). It is known that conventional RNNs, e.g., LSTM-based networks in language modeling, are characterized with either high space complexity or substantial inference time. This problem is especially crucial for mobile applications, in which the constant interaction with the remote server is inappropriate. By using the Penn Treebank (PTB) dataset we compare pruning, quantization, low-rank factorization, tensor train decomposition for LSTM networks in terms of model size and suitability for fast inference.

Cite

CITATION STYLE

APA

Grachev, A. M., Ignatov, D. I., & Savchenko, A. V. (2017). Neural Networks Compression for Language Modeling. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10597 LNCS, pp. 351–357). Springer Verlag. https://doi.org/10.1007/978-3-319-69900-4_44

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free