Neural Networks Compression for Language Modeling

Artem M. Grachev; Dmitry I. Ignatov; Andrey V. Savchenko

Conference ProceedingsOPEN ACCESS

Neural Networks Compression for Language Modeling

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10597 LNCS 351-357

DOI: 10.1007/978-3-319-69900-4_44

22Citations

39Readers

Abstract

In this paper, we consider several compression techniques for the language modeling problem based on recurrent neural networks (RNNs). It is known that conventional RNNs, e.g., LSTM-based networks in language modeling, are characterized with either high space complexity or substantial inference time. This problem is especially crucial for mobile applications, in which the constant interaction with the remote server is inappropriate. By using the Penn Treebank (PTB) dataset we compare pruning, quantization, low-rank factorization, tensor train decomposition for LSTM networks in terms of model size and suitability for fast inference.

Author supplied keywords

Cite

CITATION STYLE

APA

Grachev, A. M., Ignatov, D. I., & Savchenko, A. V. (2017). Neural Networks Compression for Language Modeling. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10597 LNCS, pp. 351–357). Springer Verlag. https://doi.org/10.1007/978-3-319-69900-4_44

Neural Networks Compression for Language Modeling

Abstract

Author supplied keywords

Cite

Register to see more suggestions