In this paper we present a Click-through-rate (CTR) prediction model for product advertisement at Amazon. CTR prediction is challenging because the model needs to a) learn from text and numeric features, b) maintain low-latency at inference time, and c) adapt to a temporal advertisement distribution shift. Our proposed model is DCAF-BERT, a novel lightweight cache-friendly factorized model that consists of twin-structured BERT-like encoders for text with a mechanism for late fusion for tabular and numeric features. The factorization of the model allows for compartmentalised retraining which enables the model to easily adapt to distribution shifts. The twin encoders are carefully trained to leverage historical CTR data, using a large pre-trained language model and cross-architecture knowledge distillation (KD). We empirically find the right combination of pretraining, distillation and fine-tuning strategies for teacher and student which leads to a 1.7% ROC-AUC lift over the previous best model offline. In an online experiment we show that our compartmentalised refresh strategy boosts the CTR of DCAF-BERT by 3.6% on average over the baseline model consistently across a month.
CITATION STYLE
Muhamed, A., Singh, J., Zheng, S., Keivanloo, I., Perera, S., Mracek, J., … Chilimbi, T. (2022). DCAF-BERT: A Distilled Cachable Adaptable Factorized Model For Improved Ads CTR Prediction. In WWW 2022 - Companion Proceedings of the Web Conference 2022 (pp. 110–115). Association for Computing Machinery, Inc. https://doi.org/10.1145/3487553.3524206
Mendeley helps you to discover research relevant for your work.