Different from traditional machine learning tasks and benchmarks, real-world problems are usually accompanied by enormous output spaces, from hundred thousands of diseases in medical diagnosis, to millions of items and billions of websites in product and web search engines. Unfortunately, conventional machine learning tools and libraries are incapable of efficiently and accurately tackling large-scale output spaces. To address this issue, PECOS (Prediction for Enormous and Correlated Output Spaces) [11] is a state-of-the-art and open-sourced machine learning library1, which not only provides high-level and user-friendly interfaces of both linear and deep learning models, but also supplies considerable flexibility for solving diverse machine learning problems. Specifically, PECOS eases complicated semantic indexing for organizing enormous output spaces, thereby efficiently training models and deriving predictions by magnitude orders on correlated output labels. As a powerful and useful framework, PECOS has already been adopted in various real- world large-scale products like semantic search in Amazon [1], as well as achieved state-of-the-art on public extreme multi-label classification (XMC) benchmarks [2, 11, 12 ] and various downstream applications [3, 7, 9]. In this tutorial, we will introduce several key functions and features of the PECOS library. By way of real-world examples, the attendees will learn how to efficiently train large-scale machine learning models for enormous output spaces, and obtain predictions in less than 1 millisecond for a data input with million labels, in the context of product recommendation and natural language processing. We will also show the flexibility of dealing with diverse machine learning problems and data formats with assorted built-in utilities in PECOS. By the end of the tutorial, we believe that attendees will be easily capable of adopting certain concepts to their own projects and address different machine learning problems with enormous output spaces
CITATION STYLE
Yu, H. F., Zhang, J., Chang, W. C., Jiang, J. Y., Li, W., & Hsieh, C. J. (2022). PECOS: Prediction for Enormous and Correlated Output Spaces. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 4848–4849). Association for Computing Machinery. https://doi.org/10.1145/3534678.3542629
Mendeley helps you to discover research relevant for your work.