Stanford professor Judy Fan went on stage at MIT and broke down why humans are so good at making the invisible visible...
And why AI hasn't actually learned to "see" the way we do.
It completely changes how you think about Human Intelligence v/s Artificial Intelligence:
Nature never gave us straight lines or sharp corners. The number line, the coordinate plane, even basic geometry are all human inventions. We created tools that do not exist in nature simply because we needed a way to think more clearly.
The coordinate system Descartes invented solved a problem that had stumped mathematicians for centuries, doubling the volume of a cube. Once invented, this tool became so indispensable that virtually every math curriculum on Earth still depends on it.
Humans have been doing this for at least 30,000 to 80,000 years. The story of human progress is inseparable from the story of marking up our environment, from cave walls to Galileo's telescope to Feynman diagrams of particles we will never see with our own eyes.
Every major scientific breakthrough relied on a visual tool that made something invisible visible. Darwin needed side-by-side illustrations of finches to see variation that was otherwise too subtle to notice. Cajal needed detailed drawings of neurons under a microscope to map how the nervous system was wired.
Fan's research group studies something deceptively simple: how people decide what to put into a drawing and what to leave out. When two people played a drawing game, sketchers used far more detail when the target object had close competitors than when it stood alone, all the way down to using fewer strokes and less time when more detail was not necessary.
People are not just copying what they see. They are making constant judgment calls about what level of detail actually serves the goal of communication, and they do this naturally without ever being taught the theory behind it.
There is a real difference between drawing something so someone can identify it and drawing something so someone can understand how it works. In one study, participants drew explanatory diagrams that emphasized moving, causal parts of a machine while depictive drawings emphasized background and overall appearance, even though both were drawing the exact same object.
Explanatory drawings were genuinely better at helping someone figure out how to operate a machine, but worse at helping someone identify which machine it actually was. You cannot optimize a single drawing for both goals at once. Communication always involves tradeoffs.
AI vision models trained on photographs generalize surprisingly well to simple, sparse sketches, suggesting that resemblance based recognition is not just a story we tell ourselves. It is something modern neural networks can replicate with real accuracy.
But there remains a large, measurable gap between how confidently AI models recognize sketches and how confidently humans do, even when both groups answer the same questions about the same images. Humans are simply far more reliable and far more consistent in their judgments.
When researchers compared human-made sketches to AI-generated sketches under tight stroke budgets, both were similarly recognizable at higher budgets, but diverged sharply as the budget shrank. Humans and AI systems simplify drawings in fundamentally different ways once resources get scarce.
Reading a graph is not one single skill. It involves perception, knowing where to look, mapping that visual information onto the actual question being asked, and then translating that mapping into an answer. Each of these steps can independently break down, and people fail for very different underlying reasons even when they land on the same wrong answer.
When tested directly against humans on graph reading tasks, leading multimodal AI models, including GPT-4V, showed a meaningful performance gap. Even when a model's overall accuracy approached human levels, its pattern of mistakes looked nothing like how humans actually get things wrong.
People choose entirely different types of charts depending on what specific question they are trying to answer, not out of a generic preference for bar charts or scatter plots. Their chart choices closely tracked which visualization would genuinely help someone answer that specific question correctly.
Two of the most widely used graph literacy tests in education research turned out to correlate strongly with each other, suggesting they measure overlapping skills. But when researchers dug into the actual error patterns, the standard categories used in textbooks, like "find the maximum" or "identify a cluster," failed to explain why people got things wrong nearly as well as a more basic, underlying four-factor model did.
The deepest goal behind all of this research is not just academic curiosity. It is to eventually help students and everyday people develop genuine literacy with the visual tools that science and modern decision-making increasingly depend on, because every generation should be able to see further than the last by standing on the visual tools the previous generation built.
Follow @yasminekho for more ideas on thinking better, becoming clearer & building a more intentional life.