Abstract
In the last several years, neural network models have significantly improved accuracy in a number of NLP tasks. However, one serious drawback that has impeded their adoption in production systems is the slow runtime speed of neural network models compared to alternate models, such as maximum entropy classifiers. In Devlin et al. (2014), the authors presented a simple technique for speeding up feed-forward embedding-based neural network models, where the dot product between each word embedding and part of the first hidden layer are pre-computed offline. However, this technique cannot be used for hidden layers beyond the first. In this paper, we explore a neural network architecture where the embedding layer feeds into multiple hidden layers that are placed "next to" one another so that each can be pre-computed independently. On a large scale language modeling task, this architecture achieves a lOx speedup at runtime and a significant reduction in perplexity when compared to a standard multilayer network.
Cite
CITATION STYLE
Devlin, J., Quirk, C., & Menezes, A. (2015). Pre-computable multi-layer neural network language models. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 256–260). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d15-1029
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.