Why I keep talking about affordances
Because science tells us intelligence starts with what you can do, not what you know.
I recently stumbled across a paper in Nature Communications by Zamboni and colleagues on how humans detect navigational affordances while viewing real-world scenes.
It was not something I was actively looking for, which probably explains why it landed as hard as it did.
Using fMRI, MEG, and behavioral testing, the authors ask a simple question. When do we figure out where we can go?
The answer arrives far earlier than most models of perception would suggest. Before interpretation settles in. Before semantic understanding has time to form.
While this is not a paper about AI, APIs, or system design, it kept pinging a thread I have been pulling on for a long time.
Checking signal strength
The core finding is refreshingly direct.
Humans detect navigational affordances, meaning viable routes of movement, very early in visual processing. Neural signatures appear around 110 milliseconds after stimulus onset, and they show up first in the dorsal early visual cortex rather than in higher-level scene regions. This means that, within about a tenth of a second after seeing a scene, the brain has already flagged possible ways to move through it.
This tells us that affordances appear before interpretation; actionability shows up before understanding. The visual system’s first useful output is not “what am I looking at?” but “what can I do here?”
Capturing the return signal
For anyone who has had to suffer through my long-running rants about the value of hypermedia, this paper appears to be an unexpected confirmation from biology.
Years ago, Roy Fielding argued that, in his REST model, hypermedia is not optional. That claim was often treated as architectural dogma or as a REST-specific concern that could be safely worked around. Static workflows, client-side assumptions, and out-of-band knowledge all became acceptable substitutes.
But that approach only holds broadly if intelligence (and systems looking to mimic intelligence) begins with internal models.
The navigational affordance paper suggests something simpler. In biological systems, intelligence starts by detecting viable actions. The environment is encountered as a field of possibilities, not as a schema waiting to be decoded.
That is precisely what hypermedia provides.
A hypermedia response does not explain the system. It shows what can be done, here and now, in context. Available actions are visible without prediction or prior agreement. Most importantly, actionability is perceived, not inferred.
This distinction matters even more for AI agents.
Large language models are excellent at forming internal representations. They are far less reliable at keeping those representations synchronized with live systems. If intelligence depended on stable world models, this would be a serious limitation.
But if intelligence begins with detecting viable actions, the design goal changes.
Instead of training agents to reason first and act later, we should train them to detect and categorize navigational affordances first. What actions are available? Which are blocked? Which are conditional? Which disappear once taken?
Seen this way, the argument leads to a practical conclusion. Agentic responses should encode navigational affordances explicitly. Not just data. Not just instructions. Actionable possibilities, expressed in-band, discoverable at runtime.
This, alone, might not result in some version of AGI. But it does give us an opportunity to align how we build systems with how intelligence appears to work in the real world.
Logging the frequency
Intelligence does not start by understanding the world. It starts by seeing where you can move.


