Abstract
A large number of multimedia documents containing texts and images have appeared on the internet, hence cross-modal retrieval in which the modality of a query is different from that of the retrieved results is being an interesting search paradigm. In this paper, a multimodal multiclass boosting framework (MMB) is proposed to capture intra-modal semantic information and inter-modal semantic correlation. Unlike traditional boosting methods which are confined to two classes or single modality, MMB could simultaneously deal with multimodal data. The empirical risk, which takes both intra-modal and inter-modal losses into account, is designed and then minimized by gradient descent in the multidimensional functional spaces. More specifically, the optimization problem is solved in turn for each modality. Semantic space can be naturally attained by applying sigmoid function to the quasi-margins. Extensive experiments on the Wiki and NUS-WIDE datasets show that the performance of our method significantly outperforms those of existing approaches for cross-modal retrieval.
Author supplied keywords
Cite
CITATION STYLE
Wang, S., Pan, P., Lu, Y., & Jiang, S. (2015). Multiclass boosting framework for multimodal data analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8936, pp. 560–571). Springer Verlag. https://doi.org/10.1007/978-3-319-14442-9_60
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.