Abstract
Transformer-based language models such as BERT provide significant accuracy improvement to a multitude of natural language processing (NLP) tasks. However, their hefty computational and memory demands make them challenging to deploy to resourceconstrained edge platforms with strict latency requirements. We present EdgeBERT, an in-depth algorithm-hardware codesign for latency-aware energy optimizations for multi-task NLP. EdgeBERT employs entropy-based early exit predication in order to perform dynamic voltage-frequency scaling (DVFS), at a sentence granularity, for minimal energy consumption while adhering to a prescribed target latency. Computation and memory footprint overheads are further alleviated by employing a calibrated combination of adaptive attention span, selective network pruning, and floating-point quantization. Furthermore, in order to maximize the synergistic benefits of these algorithms in always-on and intermediate edge computing settings, we specialize a 12nm scalable hardware accelerator system, integrating a fast-switching low-dropout voltage regulator (LDO), an all-digital phase-locked loop (ADPLL), as well as, highdensity embedded non-volatile memories (eNVMs) wherein the sparse floating-point bit encodings of the shared multi-task parameters are carefully stored. Altogether, latency-aware multi-task NLP inference acceleration on the EdgeBERT hardware system generates up to 7×, 2.5×, and 53× lower energy compared to the conventional inference without early stopping, the latency-unbounded early exit approach, and CUDA adaptations on an Nvidia Jetson Tegra X2 mobile GPU, respectively.
Author supplied keywords
Cite
CITATION STYLE
Tambe, T., Hooper, C., Pentecost, L., Jia, T., Yang, E. Y., Donato, M., … Wei, G. Y. (2021). EdgeBERT: Sentence-level energy optimizations for latency-aware multi-task NLP inference. In Proceedings of the Annual International Symposium on Microarchitecture, MICRO (pp. 830–844). IEEE Computer Society. https://doi.org/10.1145/3466752.3480095
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.