Heterogeneous Student Knowledge Distillation from BERT Using a Lightweight Ensemble Framework

Ching Sheng Lin; Chung Nan Tsai; Jung Sing Jwo; Cheng Hsiung Lee; Xin Wang

Journal ArticleOPEN ACCESS

Heterogeneous Student Knowledge Distillation from BERT Using a Lightweight Ensemble Framework

IEEE Access (2024) 12 33079-33088

DOI: 10.1109/ACCESS.2024.3372568

1Citations

22Readers

Abstract

Deep learning models have demonstrated their effectiveness in capturing complex relationships between input features and target outputs across many different application domains. These models, however, often come with considerable memory and computational demands, posing challenges for deployment on resource-constrained edge devices. Knowledge distillation is a prominent technique for transferring the expertise from an advanced yet heavy teacher model to a more efficient leaner student model. As ensemble methods have exhibited notable enhancements in model generalization and have achieved state-of-the-art performance in various machine learning tasks, we adopt ensemble techniques to perform knowledge distillation from BERT using multiple lightweight student models. Our approach applies lean architectural paradigms of spatial and sequential networks including LSTM, CNN and their fusion to perform data processing from distinct perspectives. Instead of using contextual word representations which require more space in natural language processing applications, we take advantage of a single static pre-trained and low-dimensional word embedding space to be shared among student models. Empirical studies are conducted on the sentiment classification problem and our model outperforms not only other existing techniques but also the teacher model.

Author supplied keywords

Cite

CITATION STYLE

APA

Lin, C. S., Tsai, C. N., Jwo, J. S., Lee, C. H., & Wang, X. (2024). Heterogeneous Student Knowledge Distillation from BERT Using a Lightweight Ensemble Framework. IEEE Access, 12, 33079–33088. https://doi.org/10.1109/ACCESS.2024.3372568

Heterogeneous Student Knowledge Distillation from BERT Using a Lightweight Ensemble Framework

Abstract

Author supplied keywords

Cite

Register to see more suggestions