Abstract
For a good advertising effect, images in the ad should be highly relevant with the ad title. The images in an ad are normally selected from the gallery based on their relevance scores with the ad's title. To ensure the selected images are relevant with the title, a reliable text-image matching model is necessary. The state-of-the-art text- image matching model, cross-modal BERT, only understands the visual content in the image, which is sub-optimal when the image description is available. In this work, we present MixBERT, an adimage relevance scoring model. It models the ad-image relevance by matching the ad title with the image description and visual content. MixBERT adopts a two-stream architecture. It adaptively selects the useful information from noisy image description and suppresses the noise impeding effective matching. To effectively describe the details in visual content of the image, a set of local convolutional features is used as the initial representation of the image. Moreover, to enhance the perceptual capability of our model in key entities which are important to advertising, we upgrade masked language modeling in vanilla BERT to masked key entity modeling. Offline and online experiments demonstrate its effectiveness.
Author supplied keywords
Cite
CITATION STYLE
Yu, T., Li, X., Xie, J., Yin, R., Xu, Q., & Li, P. (2021). MixBERT for Image-Ad Relevance Scoring in Advertising. In International Conference on Information and Knowledge Management, Proceedings (pp. 3597–3602). Association for Computing Machinery. https://doi.org/10.1145/3459637.3482143
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.