Maps
Graphs
Architecture
Geometry
Interface
Memory Palace
Symbols
Topology
Cinema
Diagrams
Systems
Patterns
00 / SEE BEFORE LANGUAGE

You do not think in sentences first. You think in structures.

Language is sequential — about 40 bits per second. Vision is parallel — about 100,000,000. The most important interface we have is the one we forget we are using.

VISION
10⁸
bit/s
LANGUAGE
40
bit/s
RATIO
2.5M×
fold
MOVE THE CURSOR · BEND THOUGHT-SPACE
VISION.STACK v0.1/LANGUAGE IS SEQUENTIAL · VISION IS PARALLEL/BANDWIDTH: VISION ≈ 10⁸ bit/s/BANDWIDTH: SPEECH ≈ 40 bit/s/RATIO ≈ 2,500,000 ×/MOVE THE CURSOR · BEND THOUGHT-SPACE/VISION.STACK v0.1/LANGUAGE IS SEQUENTIAL · VISION IS PARALLEL/BANDWIDTH: VISION ≈ 10⁸ bit/s/BANDWIDTH: SPEECH ≈ 40 bit/s/RATIO ≈ 2,500,000 ×/MOVE THE CURSOR · BEND THOUGHT-SPACE/VISION.STACK v0.1/LANGUAGE IS SEQUENTIAL · VISION IS PARALLEL/BANDWIDTH: VISION ≈ 10⁸ bit/s/BANDWIDTH: SPEECH ≈ 40 bit/s/RATIO ≈ 2,500,000 ×/MOVE THE CURSOR · BEND THOUGHT-SPACE/
01 / THE EVOLUTION OF VISUAL THINKING

Every era invents a new way to see.

Twelve milestones. Each one expanded what a single human mind could externalize — and so what civilization could compute on.

~40,000 BCE
Cave paintings
岩画
First externalized vision — hands and antelope on stone. The world becomes a surface.
~3000 BCE
Pictographs / Hieroglyphs
象形
Image and word fuse. To read becomes to see.
~1200 BCE
Chinese characters
汉字
Ideograms — meaning carried as visual structure for three millennia.
~300 BCE
Geometry
几何
Euclid: proof becomes diagram. Reasoning made visible.
~1420
Renaissance perspective
透视
Brunelleschi + Alberti — depth becomes computable on a flat plane.
~1800s
Engineering blueprints
蓝图
Orthographic projection. The first universal visual language of industry.
1839
Photography
摄影
Daguerre — light writing on plates. Memory becomes mechanical.
1895
Cinema
电影
Lumière — time, edited. Sequential vision learns montage.
1973–1984
Computer GUIs
GUI
Xerox PARC → Macintosh. Mouse + icon + window — cognition gets a desktop.
1990s–
Data visualization
数据
Tufte. D3. Observable. Numbers learn to be seen.
2010s–
Spatial computing
空间
HoloLens, Vision Pro, Quest — the desktop walks off the screen.
2022–
Generative cognition
生成
Diffusion + multimodal LLMs — machines start to see in our place, and dream in our pixels.
02 / THE VISUAL THINKERS

Eight people who did their thinking on paper — not in language.

The pattern is consistent: the drawing is not a record of the thought. The drawing is the thought. Language arrives later, to translate it for the rest of us.

1452–1519
Leonardo da Vinci
Painter · anatomist · engineer
达芬奇
COGNITIVE STYLE
Saw the body and the machine as variations on one structural language. The notebook entries flow image-first; text annotates the drawing, not the other way around.
VISUAL TOOLS
Mirror-script notebooks (~7,000 surviving pages), exploded-view diagrams, anatomical cross-sections, multi-projection studies.
SIGNATURE LINE
The drawing IS the thinking. Not a record of it.
03 / THE COGNITIVE STACK

Seven layers between the retina and the city.

Just as civilization is a layered OS, cognition is one too. Each upper layer rides on the discrimination capacity of the layer below; each lower layer is invisible to the layer above.

STACK · L1 → L7
L7
AI-assisted cognitionAI
Multimodal models that compress, render, and re-show our own minds back to us.
L6
Civilizational interfaces界面
Money, law, code, the city. The visual conventions that let millions coordinate.
L5
System modeling系统
Diagrams of how things hang together — the layer where engineers and economists live.
L4
Symbolic abstraction符号
Numbers, letters, equations. A discrete grammar laid on top of continuous experience.
L3
Spatial simulation空间
Mental rotation, route planning, dead-reckoning. Where Tesla's motors lived.
L2
Pattern recognition识别
Edges → faces → schemas. The brain is a difference detector running on prediction error.
L1
Sensory perception感知
Retina, cochlea, skin. ~10⁸ bits/s arriving; almost all of it discarded.
04 / THINKING TOOLS

Ten interfaces for thinking with both hands.

Every one of these is a posture for the mind. The pen on a tablet is a different brain than the pen on Notion. Pick deliberately.

Mind maps
导图
Branching thought. One root, many paths.
Infinite whiteboards
白板
tldraw, Miro, FigJam — zoom is the new chapter.
Node-graph editors
节点
Unreal Blueprints, Houdini, ComfyUI. Logic as topology.
Concept graphs
概念
Notion graph, Obsidian, Roam. Notes that know each other.
Timeline engines
时间
Aeon, Tiki-Toki, Scrollytelling. Time made spatial.
Geometric thinking tools
几何
GeoGebra, Desmos. Reasoning that you can drag.
Architectural diagrams
架构
Lucid, Excalidraw, Mermaid. The system seen from above.
Flow-state sketchpads
速写
Procreate, Concepts, Linea. The hand keeping up with the head.
Cinematic storyboards
分镜
Storyboarder, Toon Boom, FrameForge. Editing before footage.
AI visual co-thinkers
共思
Midjourney, Flux, Sora — sketching with a model in the loop.
05 / FROM TEXT TO SPACE

Civilization is visualized information.

Watch the same content pass through five compressions: prose → diagram → system → simulation → built reality. Each step gives up specificity and gains operability.

TEXT
A paragraph
“A city is a network of trade, energy, and meaning sharing one piece of geography.”
DIAGRAM
Becomes a node-link diagram
Nodes: people, firms, infra. Edges: payments, deliveries, calls.
+/−
SYSTEM
Becomes a system
Stocks, flows, feedback loops — Forrester / Meadows.
SIMULATION
Becomes a simulation
CityFlow, NetLogo, SimCity 4 — same rules, time pressed play.
REALITY
Becomes reality
Then someone pours concrete and signs a lease, and the diagram is now load-bearing.
06 / THE AGE OF VISUAL AI

Machines learn to see — and to render back.

Six families of systems. Together they make the loop closed: a thought is sketched in your head, drawn by a model, evaluated by your eye, redrawn. The drawing-as-thinking loop is now a wire.

[01]
Multimodal models
Text, image, video, audio — one weight set.
GPT-4o · Gemini 2 · Claude 4
[02]
Diffusion models
Noise → image, learned by inverse-iteration.
Flux · Stable Diffusion 3.5 · Midjourney v7
[03]
Video diffusion
Time-coherent diffusion across frames.
Sora · Veo 3 · Kling
[04]
Spatial computing
Pixels leave the screen; the room becomes the OS.
Vision Pro · Quest · Glass v3
[05]
Neural interfaces
Cursor by intention. Type by signal.
Neuralink · Synchron · Precision
[06]
World simulators
Real-time generated environments — Genie-class.
Genie 2 · GameNGen · Decart
SOURCES & FURTHER READING

Foundational texts and figures cited above.

Tufte — The Visual Display of Quantitative Information
Canon of dataviz
Alan Kay — A Personal Computer for Children of All Ages
GUI as cognitive prosthesis
Tor Nørretranders — The User Illusion
Sensory-bandwidth figures
Donella Meadows — Thinking in Systems
Stocks, flows, feedback loops
Buckminster Fuller — Synergetics
Geometry as system substrate
Hayao Miyazaki — Starting Point / Turning Point
Storyboard-first cinema
07 / THE FUTURE HUMAN

The next leap in intelligence is not faster language. It is programmable perception.

If 99% of what your brain is doing every second is vision, and the major engineering frontier in front of us is teaching machines to share that visual layer, then the next century of cognition will not be argued in sentences. It will be drawn.

Language is sequential. Vision is parallel.
语言是序列。视觉是并行。
Public sources · Tufte · Kay · Nørretranders · Meadows · Fuller · Miyazaki