Hateful memes are an emerging method of spreading hate on the internet, relying on both images and text to convey a hateful message. We take an interpretable approach to hateful meme detection, using machine learning and simple heuristics to identify the features most important to classifying a meme as hateful. In the process, we build a gradient-boosted decision tree and an LSTM-based model that achieve comparable performance (73.8 validation and 72.7 test auROC) to the gold standard of humans and state-of-the-art transformer models on this challenging task.
CITATION STYLE
Deshpande, T., & Mani, N. (2021). An Interpretable Approach to Hateful Meme Detection. In ICMI 2021 - Proceedings of the 2021 International Conference on Multimodal Interaction (pp. 723–727). Association for Computing Machinery, Inc. https://doi.org/10.1145/3462244.3479949
Mendeley helps you to discover research relevant for your work.