Abstract
In order to improve the accuracy of real-time deepFake video detection, this paper proposes a method based on the EfficientNet-TimeSformer model. Firstly, we use the transfer Xi method to use a pre-trained convolutional network for spatial feature extraction, and introduce a lightweight network for exploration, with an accuracy of 95.26% and an AUC of 95.52, which is very impressive. At the same time, the trainable parameters of the model are only 5,234,124, which is significantly higher than that of other models. Then, we optimize the model through data augmentation such as pre-training and dataset sizing to accommodate larger image sizes and longer video input sequences, so as to enhance the model's ability to process complex video data. The empirical results show that our model performs well in high-resolution videos and longer video sequences, significantly outperforming the performance of traditional convolutional neural networks, taking into account GPU memory limitations. At the same time, some limitations in the study are noted, such as the failure to test video clips longer than 96 frames and some limitations on the Celeb-DF dataset. Future research directions include in-depth research on the interpretability and generalizability of the model, as well as verifying its performance in a wider range of application scenarios.
Author supplied keywords
Cite
CITATION STYLE
Chen, Z., Wang, S., Yan, D., & Li, Y. (2024). A Spatio- Temporl Deepfake Video Detection Method Based on TimeSformer-CNN. In 3rd IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics, ICDCECE 2024. Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ICDCECE60827.2024.10549278
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.