GrokNet: Unified Computer Vision Model Trunk and Embeddings for Commerce

Sean Bell; Yiqun Liu; Sami Alsheikh; Yina Tang; Edward Pizzi; M. Henning; Karun Singh; Omkar Parkhi; Fedor Borisyuk

Conference ProceedingsOPEN ACCESS

GrokNet: Unified Computer Vision Model Trunk and Embeddings for Commerce

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2020) 2608-2616

DOI: 10.1145/3394486.3403311

20Citations

101Readers

Get full text

Abstract

In this paper, we present GrokNet, a deployed image recognition system for commerce applications. GrokNet leverages a multi-task learning approach to train a single computer vision trunk. We achieve a 2.1x improvement in exact product match accuracy when compared to the previous state-of-the-art Facebook product recognition system. We achieve this by training on 7 datasets across several commerce verticals, using 80 categorical loss functions and 3 embedding losses. We share our experience of combining diverse sources with wide-ranging label semantics and image statistics, including learning from human annotations, user-generated tags, and noisy search engine interaction data. GrokNet has demonstrated gains in production applications and operates at Facebook scale.

Author supplied keywords

Cite

CITATION STYLE

APA

Bell, S., Liu, Y., Alsheikh, S., Tang, Y., Pizzi, E., Henning, M., … Borisyuk, F. (2020). GrokNet: Unified Computer Vision Model Trunk and Embeddings for Commerce. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2608–2616). Association for Computing Machinery. https://doi.org/10.1145/3394486.3403311

GrokNet: Unified Computer Vision Model Trunk and Embeddings for Commerce

Abstract

Author supplied keywords

Cite

Register to see more suggestions