Knowledge bases (KBs) are usually essential for building practical dialogue systems. Recently we have seen rapidly growing interest in integrating knowledge bases into dialogue systems. However, existing approaches mostly deal with knowledge bases of a single modality, typically textual information. As today's knowledge bases become abundant with multimodal information such as images, audios and videos, the limitation of existing approaches greatly hinders the development of dialogue systems. In this paper, we focus on task-oriented dialogue systems and address this limitation by proposing a novel model that integrate external multimodal KB reasoning with pre-trained language models. We further enhance the model via a novel multi-granularity fusion mechanism to capture multi-grained semantics in the dialogue history. To validate the effectiveness of the proposed model, we collect a new large-scale (14K) dialogue dataset MMDialKB, built upon multimodal KB. Both automatic and human evaluation results on MMDialKB demonstrate the superiority of our proposed framework over strong baselines.
CITATION STYLE
Yang, S., Zhang, R., Erfani, S., & Lau, J. H. (2021). UniMF: A Unified Framework to Incorporate Multimodal Knowledge Bases into End-to-End Task-Oriented Dialogue Systems. In IJCAI International Joint Conference on Artificial Intelligence (pp. 3978–3984). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2021/548
Mendeley helps you to discover research relevant for your work.