Predicting the popularity of a micro-video is a challenging task, due to a number of factors impacting the distribution such as the diversity of the video content and user interests, complex online interactions, etc. In this paper, we propose a multimodal variational encoder-decoder (MMVED) framework that considers the uncertain factors as the randomness for the mapping from the multimodal features to the popularity. Specifically, the MMVED first encodes features from multiple modalities in the observation space into latent representations and learns their probability distributions based on variational inference, where only relevant features in the input modalities can be extracted into the latent representations. Then, the modality-specific hidden representations are fused through Bayesian reasoning such that the complementary information from all modalities is well utilized. Finally, a temporal decoder implemented as a recurrent neural network is designed to predict the popularity sequence of a certain micro-video. Experiments conducted on a real-world dataset demonstrate the effectiveness of our proposed model in the micro-video popularity prediction task.
CITATION STYLE
Xie, J., Zhu, Y., Zhang, Z., Peng, J., Yi, J., Hu, Y., … Chen, Z. (2020). A Multimodal Variational Encoder-Decoder Framework for Micro-video Popularity Prediction. In The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020 (pp. 2542–2548). Association for Computing Machinery, Inc. https://doi.org/10.1145/3366423.3380004
Mendeley helps you to discover research relevant for your work.