Multi-Modal Dialog State Tracking for Interactive Fashion Recommendation

9Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Multi-modal interactive recommendation is a type of task that allows users to receive visual recommendations and express natural-language feedback about the recommended items across multiple iterations of interactions. However, such multi-modal dialog sequences (i.e. turns consisting of the system's visual recommendations and the user's natural-language feedback) make it challenging to correctly incorporate the users' preferences across multiple turns. Indeed, the existing formulations of interactive recommender systems suffer from their inability to capture the multi-modal sequential dependencies of textual feedback and visual recommendations because of their use of recurrent neural network-based (i.e., RNN-based) or transformer-based models. To alleviate the multi-modal sequential dependency issue, we propose a novel multi-modal recurrent attention network (MMRAN) model to effectively incorporate the users' preferences over the long visual dialog sequences of the users' natural-language feedback and the system's visual recommendations. Specifically, we leverage a gated recurrent network (GRN) with a feedback gate to separately process the textual and visual representations of natural-language feedback and visual recommendations into hidden states (i.e. representations of the past interactions) for multi-modal sequence combination. In addition, we apply a multi-head attention network (MAN) to refine the hidden states generated by the GRN and to further enhance the model's ability in dynamic state tracking. Following previous work, we conduct extensive experiments on the Fashion IQ Dresses, Shirts, and Tops & Tees datasets to assess the effectiveness of our proposed model by using a vision-language transformer-based user simulator as a surrogate for real human users. Our results show that our proposed MMRAN model can significantly outperform several existing state-of-the-art baseline models.

Cite

CITATION STYLE

APA

Wu, Y., MacDonald, C., & Ounis, I. (2022). Multi-Modal Dialog State Tracking for Interactive Fashion Recommendation. In RecSys 2022 - Proceedings of the 16th ACM Conference on Recommender Systems (pp. 124–133). Association for Computing Machinery, Inc. https://doi.org/10.1145/3523227.3546774

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free