Multi-modal machine learning has been a prominent multi-disciplinary research area since its success in complex real-world problems. Empirically, multi-branch fusion models tend to generate better results when there is a high diversity among each branch of the model. However, such experience alone does not guarantee the fusion model's best performance nor have sufficient theoretical support. We present the theoretical estimation of the fusion models' performance by measuring each branch model's performance and the distance between branches based on the analysis of several most popular fusion methods. The theorem is validated empirically by numerical experiments. We further present a branch model selection framework to identify the candidate branches for fusion models to achieve the optimal multi-modal performance by using the theorem. The framework's effectiveness is demonstrated on various datasets by showing how effectively selecting the combination of branch models to attain superior performance.
CITATION STYLE
Qu, S., Kang, Y., & Lee, J. (2021). Efficient Multi-Modal Fusion with Diversity Analysis. In MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia (pp. 2663–2670). Association for Computing Machinery, Inc. https://doi.org/10.1145/3474085.3475188
Mendeley helps you to discover research relevant for your work.