Large pre-trained language models based on transformer architecture have drastically changed the natural language processing (NLP) landscape. However, deploying those models for on-device applications in constrained devices such as smart watches is completely impractical due to their size and inference cost. As an alternative to transformer-based architectures, recent work on effcient NLP has shown that weight-effcient models can attain competitive performance for simple tasks, such as slot flling and intent classifcation, with model sizes in the order of the megabyte. This work introduces the pNLP-Mixer architecture, an embedding-free MLP-Mixer model for on-device NLP that achieves high weight-effciency thanks to a novel projection layer. We evaluate a pNLP-Mixer model of only one megabyte in size on two multi-lingual semantic parsing datasets, MTOP and multiATIS. Our quantized model achieves 99.4% and 97.8% the performance of mBERT on MTOP and multi-ATIS, while using 170x fewer parameters. Our model consistently beats the state-of-the-art of tiny models (pQRNN), which is twice as large, by a margin up to 7.8% on MTOP.
CITATION STYLE
Fusco, F., Staar, P., Pascual, D., & Antognini, D. (2023). pNLP-Mixer: an Effcient all-MLP Architecture for Language. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 5, pp. 53–60). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-industry.6
Mendeley helps you to discover research relevant for your work.