Item categorization (IC) is a core natural language processing (NLP) task in e-commerce. As a special text classification task, fine-tuning pre-trained models, e.g., BERT, has become a main stream solution. To improve IC performance further, other product metadata, e.g., product images, have been used. Although multimodal IC (MIC) systems show higher performance, expanding from processing text to more resource-demanding images brings large engineering impacts and hinders the deployment of such dual-input MIC systems. In this paper, we proposed a new way of using product images to improve text-only IC model: leveraging cross-modal signals between products’ titles and associated images to adapt BERT models in a self-supervised learning (SSL) way. Our experiments on the three genres in the public Amazon product dataset show that the proposed method generates improved prediction accuracy and macro-F1 values than simply using the original BERT. Moreover, the proposed method is able to keep using existing text-only IC inference implementation and shows a resource advantage than the deployment of a dual-input MIC system.
CITATION STYLE
Chen, L., & Chou, H. W. (2022). Utilizing Cross-Modal Contrastive Learning to Improve Item Categorization BERT Model. In ECNLP 2022 - 5th Workshop on e-Commerce and NLP, Proceedings of the Workshop (pp. 217–223). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.ecnlp-1.25
Mendeley helps you to discover research relevant for your work.