i-Code: An Integrative and Composable Multimodal Learning Framework

22Citations
Citations of this article
61Readers
Mendeley users who have this article in their library.

Abstract

Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited to one or two modalities. We present i-Code, a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations. In this framework, data from each modality are first given to pretrained single-modality encoders. The encoder outputs are then integrated with a multimodal fusion network, which uses novel merge- and co-attention mechanisms to effectively combine information from the different modalities. The entire system is pretrained end-to-end with new objectives including masked modality unit modeling and cross-modality contrastive learning. Unlike previous research using only video for pretraining, the i-Code framework can dynamically process single, dual, and triple-modality data during training and inference, flexibly projecting different combinations of modalities into a single representation space. Experimental results demonstrate how i-Code can outperform state-of-the-art techniques on five multimodal understanding tasks and single-modality benchmarks, improving by as much as 11% and demonstrating the power of integrative multimodal pretraining.

Cite

CITATION STYLE

APA

Yang, Z., Fang, Y., Zhu, C., Pryzant, R., Chen, D., Shi, Y., … Huang, X. (2023). i-Code: An Integrative and Composable Multimodal Learning Framework. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023 (Vol. 37, pp. 10880–10890). AAAI Press. https://doi.org/10.1609/aaai.v37i9.26290

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free