Multimodal Image Retrieval

Stefan Romberg; Rainer Lienhart; Eva Hörster

Journal ArticleOPEN ACCESS

Multimodal Image Retrieval

Romberg S
Lienhart R
Hörster E

International Journal of Multimedia Information Retrieval (2012) 1(1) 31-44

DOI: 10.1007/s13735-012-0006-4

N/ACitations

20Readers

Abstract

In thiswork,we extend the standard single-layer probabilistic Latent Semantic Analysis (pLSA) (Hofmann in Mach Learn 42(1–2):177–196, 2001) to multiple layers. As multiple layers should naturally handle multiple modalities and a hierarchy of abstractions, we denote this new approach multilayer multimodal probabilistic Latent Semantic Analysis (mm-pLSA).We derive the training and inference rules for the smallest possible non-degenerated mm-pLSA model: a model with two leaf-pLSAs and a single top-level pLSA node merging the two leaf-pLSAs. We evaluate this approach on two pairs of different modalities: SIFT features and image annotations (tags) as well as the combination of SIFT and HOG features.We also propose a fast and strictly stepwise forward procedure to initialize the bottom–up mm- pLSA model, which in turn can then be post-optimized by the general mm-pLSA learning algorithm. The proposed approach is evaluated in a query-by-example retrieval task where various variants of our mm-pLSA system are compared to systems relying on a single modality and other ad-hoc combinations of feature histograms. We further describe possible pitfalls of the mm-pLSA training and analyze the resulting model yielding an intuitive explanation of its behaviour.

Cite

CITATION STYLE

APA

Romberg, S., Lienhart, R., & Hörster, E. (2012). Multimodal Image Retrieval. International Journal of Multimedia Information Retrieval, 1(1), 31–44. https://doi.org/10.1007/s13735-012-0006-4

Multimodal Image Retrieval

Abstract

Cite

Register to see more suggestions