DCAF-BERT: A Distilled Cachable Adaptable Factorized Model For Improved Ads CTR Prediction

Aashiq Muhamed; Jaspreet Singh; Shuai Zheng; Iman Keivanloo; Sujan Perera; James Mracek; Yi Xu; Qingjun Cui; Santosh Rajagopalan; Belinda Zeng; Trishul Chilimbi

Conference ProceedingsOPEN ACCESS

DCAF-BERT: A Distilled Cachable Adaptable Factorized Model For Improved Ads CTR Prediction

WWW 2022 - Companion Proceedings of the Web Conference 2022 (2022) 110-115

DOI: 10.1145/3487553.3524206

0Citations

7Readers

Abstract

In this paper we present a Click-through-rate (CTR) prediction model for product advertisement at Amazon. CTR prediction is challenging because the model needs to a) learn from text and numeric features, b) maintain low-latency at inference time, and c) adapt to a temporal advertisement distribution shift. Our proposed model is DCAF-BERT, a novel lightweight cache-friendly factorized model that consists of twin-structured BERT-like encoders for text with a mechanism for late fusion for tabular and numeric features. The factorization of the model allows for compartmentalised retraining which enables the model to easily adapt to distribution shifts. The twin encoders are carefully trained to leverage historical CTR data, using a large pre-trained language model and cross-architecture knowledge distillation (KD). We empirically find the right combination of pretraining, distillation and fine-tuning strategies for teacher and student which leads to a 1.7% ROC-AUC lift over the previous best model offline. In an online experiment we show that our compartmentalised refresh strategy boosts the CTR of DCAF-BERT by 3.6% on average over the baseline model consistently across a month.

Author supplied keywords

Cite

CITATION STYLE

APA

Muhamed, A., Singh, J., Zheng, S., Keivanloo, I., Perera, S., Mracek, J., … Chilimbi, T. (2022). DCAF-BERT: A Distilled Cachable Adaptable Factorized Model For Improved Ads CTR Prediction. In WWW 2022 - Companion Proceedings of the Web Conference 2022 (pp. 110–115). Association for Computing Machinery, Inc. https://doi.org/10.1145/3487553.3524206

DCAF-BERT: A Distilled Cachable Adaptable Factorized Model For Improved Ads CTR Prediction

Abstract

Author supplied keywords

Cite

Register to see more suggestions