Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures

Zhucheng Tu; Mengping Li; Jimmy Lin

Conference ProceedingsOPEN ACCESS

Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures

NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Demonstrations Session (2018) 6-10

DOI: 10.18653/v1/n18-5002

20Citations

80Readers

Abstract

We demonstrate the serverless deployment of neural networks for model inferencing in NLP applications using Amazon’s Lambda service for feedforward evaluation and DynamoDB for storing word embeddings. Our architecture realizes a pay-per-request pricing model, requiring zero ongoing costs for maintaining server instances. All virtual machine management is handled behind the scenes by the cloud provider without any direct developer intervention. We describe a number of techniques that allow efficient use of serverless resources, and evaluations confirm that our design is both scalable and inexpensive.

Cite

CITATION STYLE

APA

Tu, Z., Li, M., & Lin, J. (2018). Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures. In NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Demonstrations Session (pp. 6–10). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/n18-5002

Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures

Abstract

Cite

Register to see more suggestions