I Think We’re Asking the Wrong Question About AI Images

For the past year or so, almost every conversation I’ve had about AI images eventually lands in the same place. Detection.

For the past year or so, almost every conversation I’ve had about AI images eventually lands in the same place.

Detection.

How do we detect them?
How do we catch them?
How do we know if an image is real?

I hear the question everywhere.

Friends ask it.
Founders ask it.
Researchers ask it.
People building products ask it.

And lately I’ve started noticing something else.

On Twitter, every time a strange image starts circulating, the same little ritual happens.

Someone posts the image.

And within minutes, people in the replies start summoning Grok like it’s some kind of digital detective.

@Grok is this real?

Sometimes ten people ask it.

Sometimes fifty.

At this point the replies look less like a discussion and more like a town meeting where everyone is shouting the same question at the only guy in the room wearing glasses.

“Grok, can you take a look at this?”

“Grok, thoughts?”

“Grok be honest bro.”

You can almost feel the collective anxiety.

“Please tell us if reality is still working.”

And underneath all of these conversations sits the same quiet assumption: somewhere out there must be a way to look at an image and figure it out. A system smart enough to examine the pixels and decide — with enough confidence — whether the image came from a machine.

The logic seems simple enough.

Somewhere out there must be a way to look at an image and figure it out. A system smart enough to examine the pixels and decide — with enough confidence — whether the image came from a machine.

The logic seems simple enough.

If we build a detector that’s good enough, the problem goes away.

For a long time, I accepted that framing without really questioning it.

Then something about it started to bother me.

Not because detectors are useless. They clearly have their place.

But because the more I thought about it, the more the question itself started to feel… slightly off.

Detection assumes something very specific about the problem.

It assumes the right moment to understand an image is after it already exists.

You take the finished artifact — the image sitting on your screen — and you try to work backwards. You analyze the pixels. You search for patterns. You hope the image contains enough clues to reveal how it was made.

It’s almost like a forensic exercise.

But the more I watched how images actually move across the internet, the less that model made sense to me.

Images rarely stay where they were created.

They travel.

They get screenshot.
Compressed.
Cropped.
Memed.
Embedded inside other posts.
Stripped of metadata.
Shared by people who often have no idea where the image originally came from.

By the time an image reaches you, the trail behind it is usually gone.

And detectors are being asked to reconstruct that entire history from the pixels alone.

That’s an incredibly difficult job.

And it’s probably getting harder.

The generators are improving faster than the detectors.

Every time a detection method starts working well, the next generation of image models changes the landscape again.

The artifacts disappear.
The patterns shift.
The visual fingerprints move.

And suddenly the detector is chasing something that has already changed.

So the cycle repeats.

A new detector appears.
A new generator follows.
The detector adapts.
The generator evolves.

After watching this play out for a while, it started to feel less like a long-term solution and more like an arms race.

And arms races tend to keep moving, not settle down.

That’s when a different thought started forming in my head.

A much simpler one.

What if the real problem isn’t that we can’t detect AI images?

What if the real problem is that the internet has no memory of where images begin?

Because when an image appears on your screen, you’re usually seeing the end of the story.

Not the beginning.

Think about what actually happens when an image shows up on your screen.

Most of the time, you’re not seeing the beginning of the story.

You’re seeing the end of it.

The version that survived the trip across the internet. The one that made it through reposts, screenshots, compressions, edits, and algorithmic feeds. By the time it reaches you, the original moment — the instant that image first existed — is usually long gone.

And that’s when a different way of thinking about the problem started to make sense to me.

Instead of trying to stare at an image and guess where it came from, what if we simply recorded where it began?

Not by stamping obvious watermarks across the picture.
Not by hiding signals inside the pixels.

Just something simple.

When the image is created, you register a fingerprint of it. A small record that says: this image existed here, at this moment.

Almost like a receipt.

The image exists.
Here is its fingerprint.
Here is the time it appeared.
Here is the system that produced it.

Then later, if the same image shows up somewhere else — even if it’s been resized or lightly edited — that fingerprint can be checked against the registry.

Not a guess.

Just a match.