Multimodal hierarchical reinforcement learning policy for task-oriented visual dialog

17Citations
Citations of this article
120Readers
Mendeley users who have this article in their library.

Abstract

Creating an intelligent conversational system that understands vision and language is one of the ultimate goals in Artificial Intelligence (AI) (Winograd, 1972). Extensive research has focused on vision-to-language generation, however, limited research has touched on combining these two modalities in a goal-driven dialog context. We propose a multimodal hierarchical reinforcement learning framework that dynamically integrates vision and language for task-oriented visual dialog. The framework jointly learns the multimodal dialog state representation and the hierarchical dialog policy to improve both dialog task success and efficiency. We also propose a new technique, state adaptation, to integrate context awareness in the dialog state representation. We evaluate the proposed framework and the state adaptation technique in an image guessing game and achieve promising results.

Cite

CITATION STYLE

APA

Zhang, J., Zhao, T., & Yu, Z. (2018). Multimodal hierarchical reinforcement learning policy for task-oriented visual dialog. In SIGDIAL 2018 - 19th Annual Meeting of the Special Interest Group on Discourse and Dialogue - Proceedings of the Conference (pp. 140–150). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w18-5015

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free