Joint image and text representation for aesthetics analysis

36Citations
Citations of this article
33Readers
Mendeley users who have this article in their library.

Abstract

Image aesthetics assessment is essential to multimedia applications such as image retrieval, and personalized image search and recommendation. Primarily relying on visual information and manually-supplied ratings, previous studies in this area have not adequately utilized higher-level semantic information. We incorporate additional textual phrases from user comments to jointly represent image aesthetics utilizing multimodal Deep Boltzmann Machine. Given an image, without requiring any associated user comments, the proposed algorithm automatically infers the joint representation and predicts the aesthetics category of the image. We construct the AVA-Comments dataset to systematically evaluate the performance of the proposed algorithm. Experimental results indicate that the proposed joint representation improves the performance of aesthetics assessment on the benchmarking AVA dataset, comparing with only visual features.

Cite

CITATION STYLE

APA

Zhou, Y., Lu, X., Zhang, J., & Wang, J. Z. (2016). Joint image and text representation for aesthetics analysis. In MM 2016 - Proceedings of the 2016 ACM Multimedia Conference (pp. 262–266). Association for Computing Machinery, Inc. https://doi.org/10.1145/2964284.2967223

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free