RecSysOps: Best practices for operating a large-scale recommender system

Mohammad Saberian; Justin Basilico

Conference ProceedingsOPEN ACCESS

RecSysOps: Best practices for operating a large-scale recommender system

RecSys 2021 - 15th ACM Conference on Recommender Systems (2021) 590-591

DOI: 10.1145/3460231.3474620

2Citations

23Readers

Get full text

Abstract

Ensuring the health of a modern large-scale recommendation system is a very challenging problem. To address this, we need to put in place proper logging, sophisticated exploration policies, develop ML-interpretability tools or even train new ML models to predict/detect issues of the main production model. In this talk, we shine a light on this less-discussed but important area and share some of the best practices, called RecSysOps, that we've learned while operating our increasingly complex recommender systems at Netflix. RecSysOps is a set of best practices for identifying issues and gaps as well as diagnosing and resolving them in a large-scale machine-learned recommender system. RecSysOps helped us to 1) reduce production issues and 2) increase recommendation quality by identifying areas of improvement and 3) make it possible to bring new innovations faster to our members by enabling us to spend more of our time on new innovations and less on debugging and firefighting issues.

Author supplied keywords

Cite

CITATION STYLE

APA

Saberian, M., & Basilico, J. (2021). RecSysOps: Best practices for operating a large-scale recommender system. In RecSys 2021 - 15th ACM Conference on Recommender Systems (pp. 590–591). Association for Computing Machinery, Inc. https://doi.org/10.1145/3460231.3474620

RecSysOps: Best practices for operating a large-scale recommender system

Abstract

Author supplied keywords

Cite

Register to see more suggestions