Abstract
Recent years have witnessed the booming of online social media platforms with embracing the popular service called "Time-Sync Comment", which supports the viewers to share their time-sync opinions along with video content. In this way, we observe that numerous semantically-Altered terms, or "Memes", were created by niche users to express their unique ideas and emotions, and further attracted a large group of viewers with better activity and enthusiasm. Unfortunately, since the memes were created based on domain-specific knowledge and semantically varied depending on the multimodal context in videos, newcomers may fail to comprehend the semantic connotation of memes, which may severely impair their user-experiences. To deal with this issue, in this article, we propose a novel meme explanation framework, called ProMDE, to automatically capture and comprehend the memes in time-sync comments, which could further benefit the viewers with meme explanation service. Specifically, we first iteratively reconstruct the original time-sync comments compared with visual embedding to detect the semantically-Altered terms as meme candidates. Afterward, based on the guides from the domain-specific corpus, visual and textual features will be fused to represent the context-Aware multimodal cues. Moreover, to accurately describe the commonly-seen homophones in memes, i.e., they have the same pronunciation but different word-spelling expressions, we integrate the phonetic symbols as an additional modality to enhance the framework. Finally, we utilize a Transformer-based decoder to generate the natural language explanation for captured memes. Extensive experiments on a large real-world dataset prove that our framework could significantly outperform several state-of-The-Art baseline methods, demonstrating the efficacy of modeling multimodal context and pronunciation for meme detection and explanation.
Author supplied keywords
Cite
CITATION STYLE
Xie, Z., He, W., Xu, T., Wu, S., Zhu, C., Yang, P., & Chen, E. (2023). Comprehending the Gossips: Meme Explanation in Time-Sync Video Comment via Multimodal Cues. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(8). https://doi.org/10.1145/3612920
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.