Multi-modal deep learning has achieved great success in many applications. Previous works are mostly based on auto-encoder networks or paired networks, however, these methods generally consider the consensus principle on the output layers and always need deep structures. In this paper, we propose a novel Cascade Deep Multi-Modal network structure (CDMM), which generates deep multi-modal networks with a cascade structure by maximizing the correlations between each hidden homogeneous layers. In CDMM, we simultaneously train two nonlinear mappings layer by layer, and the consistency between different modal output features is considered in each homogeneous layer, besides, the representation learning ability can be forward enhanced by considering the raw feature representation simultaneously for each layer. Finally, experiments on 5 real-world datasets validate the effectiveness of our method.
CITATION STYLE
Yang, Y., Wu, Y. F., Zhan, D. C., & Jiang, Y. (2018). Deep multi-modal learning with cascade consensus. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11013 LNAI, pp. 64–72). Springer Verlag. https://doi.org/10.1007/978-3-319-97310-4_8
Mendeley helps you to discover research relevant for your work.