Multimodal Image Retrieval

  • Romberg S
  • Lienhart R
  • Hörster E
N/ACitations
Citations of this article
20Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In thiswork,we extend the standard single-layer probabilistic Latent Semantic Analysis (pLSA) (Hofmann in Mach Learn 42(1–2):177–196, 2001) to multiple layers. As multiple layers should naturally handle multiple modalities and a hierarchy of abstractions, we denote this new approach multilayer multimodal probabilistic Latent Semantic Analysis (mm-pLSA).We derive the training and inference rules for the smallest possible non-degenerated mm-pLSA model: a model with two leaf-pLSAs and a single top-level pLSA node merging the two leaf-pLSAs. We evaluate this approach on two pairs of different modalities: SIFT features and image annotations (tags) as well as the combination of SIFT and HOG features.We also propose a fast and strictly stepwise forward procedure to initialize the bottom–up mm- pLSA model, which in turn can then be post-optimized by the general mm-pLSA learning algorithm. The proposed approach is evaluated in a query-by-example retrieval task where various variants of our mm-pLSA system are compared to systems relying on a single modality and other ad-hoc combinations of feature histograms. We further describe possible pitfalls of the mm-pLSA training and analyze the resulting model yielding an intuitive explanation of its behaviour.

Cite

CITATION STYLE

APA

Romberg, S., Lienhart, R., & Hörster, E. (2012). Multimodal Image Retrieval. International Journal of Multimedia Information Retrieval, 1(1), 31–44. https://doi.org/10.1007/s13735-012-0006-4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free