Latent Pattern Sensing: Deepfake Video Detection via Predictive Representation Learning

3Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Increasingly advanced deepfake approaches have made the detection of deepfake videos very challenging. We observe that the general deepfake videos often exhibit appearance-level temporal inconsistencies in some facial components between frames, resulting in discriminable spatiotemporal latent patterns among semantic-level feature maps. Inspired by this finding, we propose a predictive representative learning approach termed Latent Pattern Sensing to capture these semantic change characteristics for deepfake video detection. The approach cascades a CNN-based encoder, a ConvGRU-based aggregator and a single-layer binary classifier. The encoder and aggregator are pre-trained in a self-supervised manner to form the representative spatiotemporal context features. Finally, the classifier is trained to classify the context features, distinguishing fake videos from real ones. In this manner, the extracted features can simultaneously describe the latent patterns of videos across frames spatially and temporally in a unified way, leading to an effective deepfake video detector. Extensive experiments prove our approach's effectiveness, e.g., surpassing 10 state-of-the-arts at least 7.92%@AUC on challenging Celeb-DF(v2) benchmark.

Cite

CITATION STYLE

APA

Ge, S., Lin, F., Li, C., Zhang, D., Tan, J., Wang, W., & Zeng, D. (2021). Latent Pattern Sensing: Deepfake Video Detection via Predictive Representation Learning. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3469877.3490586

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free