Abstract
Multimodal deep learning has gained significant attention and shown great promise in various domains, including medical, manufacturing, Internet of Things (IoT), remote sensing, and urban big data. This chapter provides an overview of neural network-based fusion techniques in multimodal deep learning. The advantages of deep learning over conventional shallow learning methods are discussed, highlighting its ability to learn both inter- and intra-modality representations with minimal preprocessing and implicit dimensionality reduction. The chapter explores different fusion methods, including early fusion, late fusion, and intermediate fusion, and discusses their capabilities and limitations. It also examines various objectives used in late fusion, such as reconstruction error, correlation-based objectives, and semantic alignment. The challenge of avoiding negative transfer in multimodal learning is addressed, and regularization objectives and training approaches are explored. Overall, this chapter serves as a comprehensive guide to multimodal deep learning and its fusion techniques, offering insights into their applications and potential for future research.
Author supplied keywords
Cite
CITATION STYLE
Shaban, A., & Yousefi, S. (2024). Multimodal Deep Learning. In Springer Optimization and Its Applications (Vol. 211, pp. 209–219). Springer. https://doi.org/10.1007/978-3-031-53092-0_10
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.