A Multimodal Variational Encoder-Decoder Framework for Micro-video Popularity Prediction

36Citations
Citations of this article
37Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Predicting the popularity of a micro-video is a challenging task, due to a number of factors impacting the distribution such as the diversity of the video content and user interests, complex online interactions, etc. In this paper, we propose a multimodal variational encoder-decoder (MMVED) framework that considers the uncertain factors as the randomness for the mapping from the multimodal features to the popularity. Specifically, the MMVED first encodes features from multiple modalities in the observation space into latent representations and learns their probability distributions based on variational inference, where only relevant features in the input modalities can be extracted into the latent representations. Then, the modality-specific hidden representations are fused through Bayesian reasoning such that the complementary information from all modalities is well utilized. Finally, a temporal decoder implemented as a recurrent neural network is designed to predict the popularity sequence of a certain micro-video. Experiments conducted on a real-world dataset demonstrate the effectiveness of our proposed model in the micro-video popularity prediction task.

Cite

CITATION STYLE

APA

Xie, J., Zhu, Y., Zhang, Z., Peng, J., Yi, J., Hu, Y., … Chen, Z. (2020). A Multimodal Variational Encoder-Decoder Framework for Micro-video Popularity Prediction. In The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020 (pp. 2542–2548). Association for Computing Machinery, Inc. https://doi.org/10.1145/3366423.3380004

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free