Deepfake detection has become increasingly important in recent years owing to the widespread availability of deepfake generation technologies. Existing deepfake detection methods present two primary limitations i.e., trained on a specific type of deepfake dataset, which renders them vulnerable to unseen deepfakes; and they regard deepfakes as a “black-box” with limited explainability, making it difficult for non-AI experts to understand and trust the decisions. Hence, this paper proposes a novel neurosymbolic deepfake detection framework that exploits the fact that human emotions cannot be imitated easily owing to their complex nature. We argue that deep fakes typically exhibit inter- or intra- modality inconsistencies in the emotional expressions of the person being manipulated. Thus, the proposed framework performs inter- and intra- modality reasoning on emotions extracted from audio and visual modalities using a psychological and arousal-valence model for deepfake detection. In addition to fake detection, the proposed framework provides textual explanations for its decisions. The results obtained using Presidential Deepfakes Dataset and World Leaders Dataset of real and manipulated videos demonstrate the effectiveness of our approach in detecting deepfakes and highlight the potential of neurosymbolic approach for expandability.
CITATION STYLE
Haq, I. U., Malik, K. M., & Muhammad, K. (2023). Multimodal Neurosymbolic Approach for Explainable Deepfake Detection. ACM Transactions on Multimedia Computing, Communications, and Applications. https://doi.org/10.1145/3624748
Mendeley helps you to discover research relevant for your work.