Image annotation has been a challenging problem due to the well-known semantic gap between two heterogeneous information modalities, i.e., the visual modality referring to low-level visual features and the semantic modality referring to high-level human concepts. To bridge the semantic gap, we present an extension of latent Dirichlet allocation (LDA), denoted as class-specific Gaussian-multinomial latent Dirichlet allocation (csGM-LDA), in an effort to simulate the human’s visual perception system. An analysis of previous supervised LDA models shows that the topics discovered by generative LDA models are driven by general image regularities rather than the semantic regularities for image annotation. To address this, csGM-LDA is introduced by using class supervision at the level of visual features for multimodal topic modeling. The csGM-LDA model combines the labeling strength of topic supervision with the flexibility of topic discovery, and the modeling problem can be effectively solved by a variational expectation-maximization (EM) algorithm. Moreover, as natural images usually generate an enormous size of high-dimensional data in annotation applications, an efficient descriptor based on Laplacian regularized uncorrelated tensor representation is proposed for explicitly exploiting the manifold structures in the high-order image space. Experimental results on two standard annotation datasets have shown the effectiveness of the proposed method by comparing with several state-of-the-art annotation methods.
CITATION STYLE
Qian, Z., Zhong, P., & Wang, R. (2015). Class-specific Gaussian-multinomial latent Dirichlet allocation for image annotation. Eurasip Journal on Advances in Signal Processing, 2015(1). https://doi.org/10.1186/s13634-015-0224-z
Mendeley helps you to discover research relevant for your work.