Query Distillation: BERT-based Distillation for Ensemble Ranking

5Citations
Citations of this article
64Readers
Mendeley users who have this article in their library.

Abstract

Recent years have witnessed substantial progress in the development of neural ranking networks, but also an increasingly heavy computational burden due to growing numbers of parameters and the adoption of model ensembles. Knowledge Distillation (KD) is a common solution to balance the effectiveness and efficiency. However, it is not straightforward to apply KD to ranking problems. Ranking Distillation (RD) has been proposed to address this issue, but only shows effectiveness on recommendation tasks. We present a novel two-stage distillation method for ranking problems that allows a smaller student model to be trained while benefitting from the better performance of the teacher model, providing better control of the inference latency and computational burden. We design a novel BERT-based ranking model structure for list-wise ranking to serve as our student model. All ranking candidates are fed to the BERT model simultaneously, such that the self-attention mechanism can enable joint inference to rank the document list. Our experiments confirm the advantages of our method, not just with regard to the inference latency but also in terms of higher-quality rankings compared to the original teacher model.

Cite

CITATION STYLE

APA

Zhang, W., Liu, J., Wen, Z., Wang, Y., & de Melo, G. (2020). Query Distillation: BERT-based Distillation for Ensemble Ranking. In COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Industry Track (pp. 33–43). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.coling-industry.4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free