Multimodal Deep Learning

5Citations
Citations of this article
2.9kReaders
Mendeley users who have this article in their library.
Get full text

Abstract

Multimodal deep learning has gained significant attention and shown great promise in various domains, including medical, manufacturing, Internet of Things (IoT), remote sensing, and urban big data. This chapter provides an overview of neural network-based fusion techniques in multimodal deep learning. The advantages of deep learning over conventional shallow learning methods are discussed, highlighting its ability to learn both inter- and intra-modality representations with minimal preprocessing and implicit dimensionality reduction. The chapter explores different fusion methods, including early fusion, late fusion, and intermediate fusion, and discusses their capabilities and limitations. It also examines various objectives used in late fusion, such as reconstruction error, correlation-based objectives, and semantic alignment. The challenge of avoiding negative transfer in multimodal learning is addressed, and regularization objectives and training approaches are explored. Overall, this chapter serves as a comprehensive guide to multimodal deep learning and its fusion techniques, offering insights into their applications and potential for future research.

Cite

CITATION STYLE

APA

Shaban, A., & Yousefi, S. (2024). Multimodal Deep Learning. In Springer Optimization and Its Applications (Vol. 211, pp. 209–219). Springer. https://doi.org/10.1007/978-3-031-53092-0_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free