Query Distillation: BERT-based Distillation for Ensemble Ranking

Wangshu Zhang; Junhong Liu; Zujie Wen; Yafang Wang; Gerard de Melo

Conference ProceedingsOPEN ACCESS

Query Distillation: BERT-based Distillation for Ensemble Ranking

COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Industry Track (2020) 33-43

DOI: 10.18653/v1/2020.coling-industry.4

5Citations

64Readers

Abstract

Recent years have witnessed substantial progress in the development of neural ranking networks, but also an increasingly heavy computational burden due to growing numbers of parameters and the adoption of model ensembles. Knowledge Distillation (KD) is a common solution to balance the effectiveness and efficiency. However, it is not straightforward to apply KD to ranking problems. Ranking Distillation (RD) has been proposed to address this issue, but only shows effectiveness on recommendation tasks. We present a novel two-stage distillation method for ranking problems that allows a smaller student model to be trained while benefitting from the better performance of the teacher model, providing better control of the inference latency and computational burden. We design a novel BERT-based ranking model structure for list-wise ranking to serve as our student model. All ranking candidates are fed to the BERT model simultaneously, such that the self-attention mechanism can enable joint inference to rank the document list. Our experiments confirm the advantages of our method, not just with regard to the inference latency but also in terms of higher-quality rankings compared to the original teacher model.

Cite

CITATION STYLE

APA

Zhang, W., Liu, J., Wen, Z., Wang, Y., & de Melo, G. (2020). Query Distillation: BERT-based Distillation for Ensemble Ranking. In COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Industry Track (pp. 33–43). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.coling-industry.4

Query Distillation: BERT-based Distillation for Ensemble Ranking

Abstract

Cite

Register to see more suggestions