MM-GATBT: Enriching Multimodal Representation Using Graph Attention Network

2Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.

Abstract

While there have been advances in Natural Language Processing (NLP), their success is mainly gained by applying a self-attention mechanism into single or multi-modalities. While this approach has brought significant improvements in multiple downstream tasks, it fails to capture the interaction between different entities. Therefore, we propose MM-GATBT, a multimodal graph representation learning model that captures not only the relational semantics within one modality but also the interactions between different modalities. Specifically, the proposed method constructs image-based node embedding which contains relational semantics of entities. Our empirical results show that MM-GATBT achieves state-of-the-art results among all published papers on the MM-IMDb dataset.

Cite

CITATION STYLE

APA

Seo, S. B., Nam, H., & Delgosha, P. (2022). MM-GATBT: Enriching Multimodal Representation Using Graph Attention Network. In NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop (pp. 106–112). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.naacl-srw.14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free