Deepfake detection

DeepFake Detection: The Importance of Training Data
Preprocessing and Practical Considerations
Dr. Symeon (Akis) Papadopoulos – @sympap
MeVer Team @ Information Technologies Institute (ITI) /
Centre for Research & Technology Hellas (CERTH)
Joint work with Polychronis Charitidis, George Kordopatis-Zilos and
Yiannis Kompatsiaris
AI4Media Workshop on GANs for Media Content Generation, Oct 1, 2020
Media Verification
(MeVer)

DeepFakes
• Content, generated by AI, that seems
authentic to human eye
• Most common form: generation and
manipulation of human face
Source: https://en.wikipedia.org/wiki/Deepfake
Source: https://www.youtube.com/watch?v=iHv6Q9ychnA
Source: Media Forensics and DeepFakes: an overview

Manipulation types
Facial manipulations can
be categorised in four
main different groups:
• Entire face synthesis
• Attribute manipulation
• Identity swap
• Expression swap
Source: DeepFakes and Beyond: A Survey of Face Manipulation and Fake
Detection (Tolosana et al., 2020)
Tolosana, R., et al. (2020). Deepfakes and beyond:
A survey of face manipulation and fake
detection. arXiv preprint arXiv:2001.00179.
Verdoliva, L. (2020). Media forensics and deepfakes:
an overview. arXiv preprint arXiv:2001.06564.
Mirsky, Y., & Lee, W. (2020). The Creation and
Detection of Deepfakes: A Survey. arXiv preprint
arXiv:2004.11138.

WeVerify Project
• WeVerify aims at detecting disinformation in social media and expose
misleading and fabricated content
• Partners: Univ. Sheffield, OntoText, ATC, DW, AFP, EU DisinfoLab, CERTH
• A key outcome is a platform for collaborative content verification,
tracking, and debunking
• Currently, we are developing a deepfake detection service for the
WeVerify platform
• Participation in DeepFake Detection Challenge
https://weverify.eu/

DeepFake Detection Challenge
• Goal: detect videos with facial or voice manipulations
• 2,114 teams participated in the challenge
• Log Loss error evaluation on public and private validation sets
• Public evaluation contained videos with similar transformations as the
training set
• Private evaluation contained organic videos and videos with unknown
transformations from the Internet
• Our final standings:
• public leaderboard: 49 (top 3%) with 0.295 Log Loss error
• private leaderboard: 115 (top 5%) with 0.515 Log Loss error
Source: https://www.kaggle.com/c/deepfake-detection-challenge

DeepFake Detection Challenge - dataset
• Dataset of more than 110k videos
• Approx. 20k REAL and the rest are FAKE
• FAKE videos generated from the REAL
• Models used:
• DeepFake AutoEncoder (DFAE)
• Morphable Mask faceswap (MM/NN)
• Neural Talking Heads (NTH)
• FSGAN
• StyleGAN
Dolhansky, B., Bitton, J., Pflaum, B., Lu, J., Howes, R., Wang,
M., & Ferrer, C. C. (2020). The DeepFake Detection Challenge
Dataset. arXiv preprint arXiv:2006.07397.

Dataset preprocessing - Issues
• Face dataset quality depends on face extraction accuracy (Dlib,
mtcnn, facenet-pytorch, Blazeface)
• Generally all face extraction libraries generate a number of false
positive detections
• Manual tuning can improve the quality of the generated dataset
Deep learning
model
Face
extraction
Frame
extraction
Video
corpus

Noisy data creeping in the training set
• Extracting faces with 1 fps from Kaggle DeepFake Detection Challenge dataset
videos using pytorch implementation of MTCNN face detection
• Observation: False detections are less compared to true detections in a video

Our “noise” filtering approach
• Compute face embeddings for each detected face in video
• Similarity calculation between all face embeddings in a video → similarity graph construction
• Nodes represent faces and two faces are connected if their similarities are greater than 0.8 (solid lines)
• Drop components smaller than N/2 (e.g. component 2)
• N is the number of frames that contain face detections (true or false).

Advantages
• Simple and fast procedure
• No need for manual tuning of the face extraction settings
• Clusters of distinct faces in cases of multiple persons in the video
• This information can be utilized in various ways (e.g. predictions per face)
Faces extracted from multiple video frames
Component 1
Component 2

Experiments
• We trained multiple DeepFake detection models on the DFDC dataset
with and without (baseline) our proposed approach
• Three datasets: a) Celeb-DF, b) FaceForensics++, c) DFDC subset
• For evaluation we examined two aggregation approaches
• avg: prediction is the average of all face predictions
• face: prediction is the max prediction among different avg face predictions
• Results for the EfficientNet-B4 model in terms of Log loss error:
Pre-
processing
CelebDF FaceForensics++ DFDC
avg face avg face avg face
baseline 0,510 0,511 0,563 0,563 0,213 0,198
proposed 0,458 0,456 0,497 0,496 0,195 0,173

Our DFDC Approach - details
• Applied proposed preprocessing approach to clean the generated face dataset
• Face augmentation:
• Horizontal & vertical flip, random crop, rotation, image compression, Gaussian & motion
blurring, brightness, saturation & contrast transformation
• Trained three different models: a) EfficientNet-B3, b) EfficientNet-B4, c) I3D*
• Models trained on face level:
• I3d trained with 10 consecutive face images exploiting temporal information.
• EfficientNet models trained on single face images
• Per model:
• Added two dense layers with dropout after the backbone architecture with 256 and 1 units
• Used the sigmoid activation for the last layer
* ignoring the optical flow stream

Our DFDC approach – inference
pre-processing model inference post-processing

Lessons from other DFDC teams
• Most approaches ensemble multiple EfficientNet architectures (B3-B7) and
some of them were trained on different seeds
• ResNeXT was another architecture used by a top-performing solutions
combined with 3D architectures such as I3D, 3D ResNet34, MC3 & R2+1D
• Several approaches increased the margin of the detected facial bounding
box to further improve results.
• We used an additional margin of 20% but other works proposed a higher proportion.
• To improve generalization:
• Domain-specific augmentations: a) half face removal horizontally or vertically, b)
landmark (eyes, nose, or mouth) removal
• Mixup augmentations

Practical challenges
• Limited generalization
• This observation applies to most submissions. The winning team scored
0.20336 in public validation and only 0.42798 in the private (Log Loss)
• Overfitting
• The best submission in the public leaderboard scored 0.19207 but in the
private evaluation the error was 0.57468, leading to the 904-th position!
• Broad problem scope
• The term DeepFake may refer to every possible manipulation and generation
• Constantly increasing manipulation and generation techniques
• A detector is only trained with a subset of these manipulations

DeepFake Detection in the Wild
• Videos in the wild usually contain multiple scenes
• Only a subset of these scenes may contain DeepFakes
• Detection process might be slow for multi-shot videos (even short ones)
• Low quality videos
• Low quality faces tend to fool classifiers
• Small detected and fast-moving faces
• Usually lead to noisy predictions
• Changes in the environment
• Moving obstacles in front of the faces
• Changes in lighting

DeepFake Detection Service @ WeVerify
https://www.youtube.com/watch?v=cVljNV
V5VPw&ab_channel=TheFakening

More details at TTO 2020
Charitidis, P., Kordopatis-Zilos, G., Papadopoulos, S., & Kompatsiaris, Y.
(2020). Investigating the impact of preprocessing and prediction
aggregation on the DeepFake detection task. Proceedings of the
Conference for Truth and Trust Online (TTO) [to appear],
https://arxiv.org/abs/2006.07084
https://truthandtrustonline.com/

Thank you!
Dr. Symeon Papadopoulos
papadop@iti.gr
@sympap
Media Verification (MeVer)
https://mever.iti.gr/
@meverteam https://ai4media.eu/
https://weverify.eu/

Deepfake detection

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Deepfake detection

Similaire à Deepfake detection (20)

Plus de Weverify

Plus de Weverify (20)

Dernier

Dernier (20)

Deepfake detection