M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

Junke Wang; Zuxuan Wu; Wenhao Ouyang; Xintong Han; Jingjing Chen; Ser Nam Lim; Yu Gang Jiang

Conference ProceedingsOPEN ACCESS

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

ICMR 2022 - Proceedings of the 2022 International Conference on Multimedia Retrieval (2022) 615-623

DOI: 10.1145/3512527.3531415

291Citations

214Readers

Get full text

Abstract

The widespread dissemination of Deepfakes demands effective approaches that can detect perceptually convincing forged images. In this paper, we aim to capture the subtle manipulation artifacts at different scales using transformer models. In particular, we introduce a Multi-modal Multi-scale TRansformer (M2TR), which operates on patches of different sizes to detect local inconsistencies in images at different spatial levels. M2TR further learns to detect forgery artifacts in the frequency domain to complement RGB information through a carefully designed cross modality fusion block. In addition, to stimulate Deepfake detection research, we introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods. We conduct extensive experiments to verify the effectiveness of the proposed method, which outperforms state-of-the-art Deepfake detection methods by clear margins.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, J., Wu, Z., Ouyang, W., Han, X., Chen, J., Lim, S. N., & Jiang, Y. G. (2022). M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection. In ICMR 2022 - Proceedings of the 2022 International Conference on Multimedia Retrieval (pp. 615–623). Association for Computing Machinery, Inc. https://doi.org/10.1145/3512527.3531415

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

Abstract

Author supplied keywords

Cite

Register to see more suggestions