Cross-Encoder Data Annotation for Bi-Encoder Based Product Matching

3Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Matching a seller listed item to an appropriate product is an important step for an e-commerce platform. With the recent advancement in deep learning, there are different encoder based approaches being proposed as solution. When textual data for two products are available, cross-encoder approaches encode them jointly while bi-encoder approaches encode them separately. Since cross-encoders are computationally heavy, approaches based on bi-encoders are a common practice for this challenge. In this paper, we propose cross-encoder data annotation; a technique to annotate or refine human annotated training data for bi-encoder models using a cross-encoder model. This technique enables us to build a robust model without annotation on newly collected training data or further improve model performance on annotated training data. We evaluate the cross-encoder data annotation on the product matching task using a real-world e-commerce dataset containing 104 million products. Experimental results show that the cross-encoder data annotation improves 4% absolute accuracy when no annotation for training data is available, and 2% absolute accuracy when annotation for training data is available.

Cite

CITATION STYLE

APA

Chiu, J., & Shinzato, K. (2022). Cross-Encoder Data Annotation for Bi-Encoder Based Product Matching. In EMNLP 2022 - Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track (pp. 171–178). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-industry.16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free