PILE: Pairwise Iterative Logits Ensemble for Multi-Teacher Labeled Distillation

1Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.

Abstract

Pre-trained language models have become a crucial part of ranking systems and achieved very impressive effects recently. To maintain high performance while keeping efficient computations, knowledge distillation is widely used. In this paper, we focus on two key questions in knowledge distillation for ranking models: 1) how to ensemble knowledge from multi-teacher; 2) how to utilize the label information of data in the distillation process. We propose a unified algorithm called Pairwise Iterative Logits Ensemble (PILE) to tackle these two questions simultaneously. PILE ensembles multi-teacher logits supervised by label information in an iterative way and achieved competitive performance in both offline and online experiments. The proposed method has been deployed in a real-world commercial search system.

Cite

CITATION STYLE

APA

Cai, L., Zhang, L., Ma, D., Fan, J., Shi, D., Wu, Y., … Yin, D. (2022). PILE: Pairwise Iterative Logits Ensemble for Multi-Teacher Labeled Distillation. In EMNLP 2022 - Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track (pp. 597–605). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-industry.60

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free