CAE-Net: Generalized deepfake image detection using convolution and attention mechanisms with spatial and frequency domain features

Anindya Bhattacharjee; Kaidul Islam; Kafi Anan; Ashir Intesher; Abrar Assaeem Fuad; Utsab Saha; Hafiz Imtiaz

Journal Article

CAE-Net: Generalized deepfake image detection using convolution and attention mechanisms with spatial and frequency domain features

Journal of Visual Communication and Image Representation (2026) 115

DOI: 10.1016/j.jvcir.2025.104679

1Citations

67Readers

Get full text

Abstract

The spread of deepfakes poses significant security concerns, demanding reliable detection methods. However, diverse generation techniques and class imbalance in datasets create challenges. We propose CAE-Net, a Convolution- and Attention-based weighted Ensemble network combining spatial and frequency-domain features for effective deepfake detection. The architecture integrates EfficientNet, Data-Efficient Image Transformer (DeiT), and ConvNeXt with wavelet features to learn complementary representations. We evaluated CAE-Net on the diverse IEEE Signal Processing Cup 2025 (DF-Wild Cup) dataset, which has a 5:1 fake-to-real class imbalance. To address this, we introduce a multistage disjoint-subset training strategy, sequentially training the model on non-overlapping subsets of the fake class while retaining knowledge across stages. Our approach achieved 94.46% accuracy and a 97.60% AUC, outperforming conventional class-balancing methods. Visualizations confirm the network focuses on meaningful facial regions, and our ensemble design demonstrates robustness against adversarial attacks, positioning CAE-Net as a dependable and generalized deepfake detection framework.

Author supplied keywords

Cite

CITATION STYLE

APA

Bhattacharjee, A., Islam, K., Anan, K., Intesher, A., Fuad, A. A., Saha, U., & Imtiaz, H. (2026). CAE-Net: Generalized deepfake image detection using convolution and attention mechanisms with spatial and frequency domain features. Journal of Visual Communication and Image Representation, 115. https://doi.org/10.1016/j.jvcir.2025.104679

CAE-Net: Generalized deepfake image detection using convolution and attention mechanisms with spatial and frequency domain features

Abstract

Author supplied keywords

Cite

Register to see more suggestions