Together Yet Apart: Multimodal Representation Learning for Personalised Visual Art Recommendation

3Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

With the advent of digital media, the availability of art content has greatly expanded, making it increasingly challenging for individuals to discover and curate works that align with their personal preferences and taste.The task of providing accurate and personalized Visual Art (VA) recommendations is thus a complex one, requiring a deep understanding of the intricate interplay of multiple modalities such as image, textual descriptions, or other metadata.In this paper, we study the nuances of modalities involved in the VA domain (image and text) and how they can be effectively harnessed to provide a truly personalized art experience to users.Particularly, we develop four fusion-based multimodal VA recommendation pipelines and conduct a large-scale user-centric evaluation.Our results indicate that early fusion (i.e, joint multimodal learning of visual and textual features) is preferred over a late fusion of ranked paintings from unimodal models (state-of-the-art baselines) but only if the latent representation space of the multimodal painting embeddings is entangled.Our findings open a new perspective for a better representation learning in the VA RecSys domain.

Cite

CITATION STYLE

APA

Yilma, B. A., & Leiva, L. A. (2023). Together Yet Apart: Multimodal Representation Learning for Personalised Visual Art Recommendation. In UMAP 2023 - Proceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization (pp. 204–214). Association for Computing Machinery, Inc. https://doi.org/10.1145/3565472.3592964

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free