In many real-world applications (e.g. social media application), data usually consists of diverse input modalities that originates from various heterogeneous sources. Learning a similarity measure for such data is of great importance for vast number of applications such as classification, clustering, retrieval, etc. Defining an appropriate distance metric between data points with multiple modalities is a key challenge that has a great impact on the performance of many multimedia applications. Existing approaches for multi-modal distance metric learning only offer point estimation of the distance matrix and/or latent features, and can therefore be unreliable when the number of training examples is small. In this paper we present a novel Bayesian framework for learning distance functions on multi-modal data through Beta Process, by which we embed data of different modalities into a single latent space. Moreover, using the flexible Beta process model, we can infer the dimensionality of the hidden space using training data itself. We also develop a novel Variational Bayes (VB) algorithm to compute the posterior distribution of the parameters that imposes the constraints (similarity/dissimilarity constraints) directly on the posterior distribution. We apply our framework to text/image data and present empirical results on retrieval and classification to demonstrate the effectiveness of the proposed model.
CITATION STYLE
Babagholami-Mohamadabadi, B., Roostaiyan, S. M., Zarghami, A., & Baghshah, M. S. (2015). Multi-modal distance metric learning: A bayesian non-parametric approach. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8927, pp. 63–77). Springer Verlag. https://doi.org/10.1007/978-3-319-16199-0_5
Mendeley helps you to discover research relevant for your work.