Multimodal Deep Learning

Amirreza Shaban; Safoora Yousefi

Book Chapter

Multimodal Deep Learning

Springer, (2024), 209-219

DOI: 10.1007/978-3-031-53092-0_10

5Citations

2.9kReaders

Get full text

Abstract

Multimodal deep learning has gained significant attention and shown great promise in various domains, including medical, manufacturing, Internet of Things (IoT), remote sensing, and urban big data. This chapter provides an overview of neural network-based fusion techniques in multimodal deep learning. The advantages of deep learning over conventional shallow learning methods are discussed, highlighting its ability to learn both inter- and intra-modality representations with minimal preprocessing and implicit dimensionality reduction. The chapter explores different fusion methods, including early fusion, late fusion, and intermediate fusion, and discusses their capabilities and limitations. It also examines various objectives used in late fusion, such as reconstruction error, correlation-based objectives, and semantic alignment. The challenge of avoiding negative transfer in multimodal learning is addressed, and regularization objectives and training approaches are explored. Overall, this chapter serves as a comprehensive guide to multimodal deep learning and its fusion techniques, offering insights into their applications and potential for future research.

Author supplied keywords

Cite

CITATION STYLE

APA

Shaban, A., & Yousefi, S. (2024). Multimodal Deep Learning. In Springer Optimization and Its Applications (Vol. 211, pp. 209–219). Springer. https://doi.org/10.1007/978-3-031-53092-0_10

Multimodal Deep Learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions