In March 2026 Brazil's electoral court (TSE) published Resolution 23.755, the first set of rules for artificial intelligence inside an election campaign: synthetic content must be labelled, deepfakes are banned in the final stretch of the vote, and anyone who breaks the rule faces fines from R$ 5,000 to R$ 30,000. By June the court had already logged complaints involving the use of AI. The bar moved, and it quietly turned into an engineering problem.
If you build any platform where media circulates, detecting synthetic content stopped being a curiosity and became a requirement. And the first step is the most counterintuitive one: stop trusting the human eye. Detection that actually works does not happen in perception, it happens in the frequency domain and in the geometry of shadows. Let's see how to prove it with Python, in practice.
Why the human eye fails (and the models know it)
Instinct tells you to look for the odd blink, the lips that fall out of sync, the plastic-looking skin. That was exactly the set of cues the first deepfakes left behind, back around 2019. The thing is, every one of those cues became a training target. The 2026 generators learned to blink correctly, to sync lips and to render skin pores. When the defence relies on perception, it loses to a system that trains precisely against perception.
The threat moved too. The "face swap" video is the most talked-about case, but today the bigger risk is content generated from scratch: a voice cloned from a few minutes of audio, a portrait of someone who never existed, a fabricated chat screenshot. In those cases there is no "original" to compare against. What is left is the evidence that the generation process itself leaves in the file.
And here is the good news for anyone technical: every generator leaves a trace. Networks that synthesise images use upsampling layers that stamp a periodic pattern into the frequency spectrum. Recompression and splicing leave different error levels in different regions. Eye reflections and shadow direction rarely close the geometry. The eye sees none of this, but NumPy sees it just fine. If you want to understand the terrain before coding, it is worth revisiting our content on security and cybersecurity fundamentals, because media forensics is a close cousin of incident forensics.
Trace 1: frequency analysis with FFT
Real photographic images have a relatively smooth frequency spectrum. Images generated by GANs and by many diffusion models, on the other hand, carry periodic artefacts, a fine "checkerboard" that comes from the upsampling operations (transposed convolutions). That pattern is invisible in the spatial domain, but it jumps out in the spectrum once you apply the Fast Fourier Transform.
The idea behind the code: load the image in greyscale, compute the 2D FFT, centre the spectrum and look at the magnitude on a logarithmic scale. Regular peaks away from the centre are suspicious. You can reduce this to a single number by taking the radial (azimuthal) average of the spectrum and measuring the energy in the high frequencies.
import numpy as np
import cv2
def spectral_signature(path):
# 1) greyscale image, values 0..1
img = cv2.imread(path, cv2.IMREAD_GRAYSCALE).astype(np.float32) / 255.0
# 2) 2D FFT + shift to centre the zero frequency
f = np.fft.fftshift(np.fft.fft2(img))
magnitude = np.log(np.abs(f) + 1e-8) # log keeps the scale from exploding
# 3) radial average: mean energy per frequency band
h, w = magnitude.shape
cy, cx = h // 2, w // 2
y, x = np.indices((h, w))
radius = np.hypot(x - cx, y - cy).astype(np.int32)
profile = np.bincount(radius.ravel(), magnitude.ravel()) / np.bincount(radius.ravel())
# 4) high-frequency energy ratio (last third of the spectrum)
cut = len(profile) // 3
high = profile[2 * cut:].mean()
low = profile[:cut].mean()
return high / low # the higher it is, the more suspicious
ratio = spectral_signature("suspect.jpg")
print(f"High/low energy ratio: {ratio:.3f}")
# Synthetic images tend to score higher because of upsampling
That value alone condemns nothing. The right move is to calibrate: run the function over a folder of real photos and a folder of generated images, see where each distribution lands, and set a threshold for your domain. It is the kind of baseline that changes from camera to camera and from model to model.
Trace 2: Error Level Analysis on images
Error Level Analysis (ELA) exploits one detail of JPEG: every time you save, different regions of the image lose quality at different rates. In an authentic photo saved once, the recompression error is fairly uniform. When someone pastes in a generated face, inserts text or recomposes a screenshot, the edited region usually has a different compression history from the rest, and that shows up as a "glow" in the error map.
from PIL import Image, ImageChops
import numpy as np
def error_level_analysis(path, quality=90):
original = Image.open(path).convert("RGB")
# 1) re-save the JPEG at a known quality
original.save("_tmp_ela.jpg", "JPEG", quality=quality)
recompressed = Image.open("_tmp_ela.jpg")
# 2) pixel-by-pixel difference between original and recompressed
diff = ImageChops.difference(original, recompressed)
# 3) normalise to reveal the regions of highest error
arr = np.asarray(diff).astype(np.float32)
scale = 255.0 / max(arr.max(), 1.0)
emap = (arr * scale).clip(0, 255).astype(np.uint8)
Image.fromarray(emap).save("ela_result.png")
return arr.mean(), arr.std() # mean and std of the error
mean, std = error_level_analysis("suspect_screenshot.jpg")
print(f"Mean error: {mean:.2f} | std: {std:.2f}")
# High std + an isolated region glowing in the map = sign of splicing
ELA is great for fabricated screenshots and crude splicing, and weak for a 100% synthetic image saved a single time (no "odd" region to contrast against). That is why it comes in as one piece of evidence, never as a verdict. Anyone who has handled incident response recognises the pattern: you stack independent clues until the conclusion stands on its own.
Trace 3: facial consistency and eye reflections with MediaPipe
In video you can go beyond the pixel and look at the biological and physical coherence of the face over time. Two cues still carry a lot of weight: the rate and symmetry of blinking, and the light reflection on the cornea. In a real face, both eyes see the same environment, so the highlight points (specular reflection) land in coherent positions. Synthesis tends to treat each eye independently and breaks that.
The example below uses MediaPipe Face Mesh to extract 468 landmarks and compute the Eye Aspect Ratio (EAR), the classic metric for measuring eye openness frame by frame. Sequences with no blinking at all, or with both eyes always at an identical EAR, are suspicious.
import cv2
import mediapipe as mp
import numpy as np
mesh = mp.solutions.face_mesh.FaceMesh(refine_landmarks=True)
# left-eye landmark indices in Face Mesh
LEFT_EYE = [33, 160, 158, 133, 153, 144]
def eye_aspect_ratio(points):
# ratio of vertical to horizontal eye distances
vertical = np.linalg.norm(points[1] - points[5]) + np.linalg.norm(points[2] - points[4])
horizontal = np.linalg.norm(points[0] - points[3])
return vertical / (2.0 * horizontal)
cap = cv2.VideoCapture("suspect_video.mp4")
history, blinks = [], 0
while cap.isOpened():
ok, frame = cap.read()
if not ok:
break
res = mesh.process(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
if res.multi_face_landmarks:
h, w = frame.shape[:2]
lm = res.multi_face_landmarks[0].landmark
pts = np.array([[lm[i].x * w, lm[i].y * h] for i in LEFT_EYE])
ear = eye_aspect_ratio(pts)
history.append(ear)
if ear < 0.18: # eye almost closed = one blink
blinks += 1
cap.release()
print(f"Blinks detected: {blinks} | mean EAR: {np.mean(history):.3f}")
# A healthy adult blinks ~15-20x per minute. Near-zero blinking is a red flag.
For the corneal reflection, the logic is similar: crop each eye region using the landmarks, find the brightest point (the reflection) in each one and check whether the relative positions are compatible with a single light source. A large divergence between the two eyes is a strong sign of synthesis, because the generator does not model the physics of the scene, it just fills in plausible pixels.
What this changes in your day to day (and the limits)
Notice the common thread: no single technique solves it. FFT catches synthetic images and misses on heavily compressed photos. ELA catches splicing and slips past pure synthesis. MediaPipe is strong on face video and useless on cloned audio. Serious detection engineering is an ensemble: you run several independent detectors, normalise each output and combine them into a final score, the same way an anti-fraud system stacks weak signals until it forms a strong decision.
def synthetic_score(image_path):
# combine independent evidence into a single probability (0..1)
fft_ratio = spectral_signature(image_path)
_, ela_std = error_level_analysis(image_path)
# simple normalisation by thresholds calibrated on YOUR dataset
fft_signal = min(fft_ratio / 1.5, 1.0)
ela_signal = min(ela_std / 12.0, 1.0)
# weights set by validation, not by guesswork
score = 0.6 * fft_signal + 0.4 * ela_signal
return round(score, 2)
print(f"Probability of synthesis: {synthetic_score('suspect.jpg'):.0%}")
That number, "78% probability of AI generation", is exactly the kind of output that shows up in the technical analyses behind formal complaints. And it only has value if you can defend how it was computed: which dataset calibrated the thresholds, what the false-positive rate is, which version of the model. Detection without an auditable methodology is a guess dressed up as science.
There is also the side of whoever builds a platform under Resolution 23.755. The rule talks about labelling synthetic content and responding quickly to complaints. In practice that becomes a pipeline: media ingestion, an analysis queue, an automatic score, human review for the edge cases and an audit trail for every decision. For anyone who handles application security, it is one more non-functional requirement landing in the backlog, alongside data protection and access logs.
Where to study in depth? The MediaPipe Face Landmarker documentation covers the 468 points and the iris refinement, and the official text of TSE Resolution 23.755 brings the legal definitions of synthetic content and the deadlines. It is worth reading the two side by side: one gives you the tool, the other gives you the requirement.
Takeaways and the next step
- The human eye is the wrong detector: the 2026 generators train against perception. Use frequency, compression error and the physics of the scene.
- Combine evidence: FFT for synthesis, ELA for splicing, MediaPipe for face video. None is a verdict on its own, the ensemble is.
- A score is only worth it with methodology: a calibrated threshold, a known dataset, a measured false-positive rate. Without that, it does not defend a complaint.
Have you ever had to prove that a video or screenshot was AI-generated? Which technique did you use, and what worked in real life? Tell me in the comments, I want to build a second article from your cases, focused on detecting cloned voice, which is the hottest frontier of the 2026 elections.
Frequently asked questions
Is detecting deepfakes in Python good enough for legal use, like a complaint to the electoral court?
It works as technical evidence, not as a final, standalone proof. The score helps to prioritise and to support a case, but it has to come with an auditable methodology: the calibration dataset, the thresholds, the false-positive rate and the version of the tools. A serious report combines the automatic result with qualified human review.
Do these techniques catch images made by a diffusion model, not just by a GAN?
Partly. The frequency signature is sharper on GANs, but many diffusion pipelines also leave detectable upsampling artefacts and noise patterns. The weak point is the fast evolution of the models, which is why calibration has to be redone often and the ensemble needs new detectors.
Do I need a GPU or a trained model to start?
For FFT, ELA and MediaPipe, no. It all runs on CPU with NumPy, OpenCV, Pillow and MediaPipe. A GPU and dedicated CNN detectors (such as models trained on FaceForensics++) come in when you want to raise accuracy and process volume, but you can stand up the auditable baseline today on your laptop.
What is the biggest false-positive risk?
Heavily compressed images, with strong filters or rescaled by social networks. The platforms' recompression scrambles both the frequency and the ELA. That is why the threshold has to be calibrated on the same type of media you analyse, and never on "lab" photos.
And cloned-voice audio, can you detect that with Python too?
You can, with a different toolbox: spectral analysis of the audio, detection of vocoder artefacts and anti-spoofing models. It is the hardest frontier of the 2026 elections because voice cloning became cheap and convincing. That is the topic I plan to open in the next article in the series.