Multimodal attention agents in visual conversation

Lorena Kodra; Elinda Kajo Meçe

Book Chapter

Multimodal attention agents in visual conversation

Springer Science and Business Media Deutschland GmbH, (2018), 584-596

DOI: 10.1007/978-3-319-75928-9_52

0Citations

4Readers

Get full text

Abstract

Visual conversation has recently emerged as a research area in the visually-grounded language modeling domain. It requires an intelligent agent to maintain a natural language conversation with humans about visual content. Its main difference from traditional visual question answering is that the agent must infer the answer not only by grounding the question in the image, but also from the context of the conversation history. In this paper we propose a novel multimodal attention architecture that enables the conversation agent to focus on parts of the conversation history and specific image regions to infer the answer based on the conversation context. We evaluate our model on the VisDial dataset and demonstrate that it performs better than current state of the art.

Cite

CITATION STYLE

APA

Kodra, L., & Meçe, E. K. (2018). Multimodal attention agents in visual conversation. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 17, pp. 584–596). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-75928-9_52

Multimodal attention agents in visual conversation

Abstract

Cite

Register to see more suggestions