All Posts
AI Trends2026-04-086 min read

The Word Karpathy Used Was 'Compile' — And That Changes Everything

Everyone is copying Karpathy's three folders. Almost no one is paying attention to the verb he used to describe what happens between them. The verb is the whole thesis.

LLM WikiKarpathycompileknowledge baseMCPRAG

When people quote Karpathy's LLM Wiki gist, they almost always quote the structure: three folders, raw sources, a wiki, a schema file. That part is easy to copy. There are already half a dozen GitHub repos that scaffold those folders for you in under a minute.

But if you read the gist carefully, the most important thing in it isn't a folder. It's a verb.

Karpathy says the knowledge is compiled once and then kept current, not re-derived on every query.

Not "summarized." Not "indexed." Not "ingested." Compiled.

I think that single word is the whole thesis, and almost everyone reproducing his setup is missing what it implies.

What "compile" actually means

In software, compiling is not summarizing source code. A compiler doesn't read your C file and write a shorter C file. It transforms source into a different artifact, in a different representation, optimized for a different consumer. The output isn't meant to be read by the same audience as the input. It's meant to be executed by something that the input couldn't talk to directly.

That is exactly the right metaphor for what Karpathy is describing, and it is exactly the wrong metaphor for what RAG does.

RAG doesn't compile anything. RAG is a runtime lookup. Every query, you go back to the original sources, fetch chunks that look statistically similar to the question, and hope that pasting them into a context window produces something coherent. The sources are never transformed. The model never gets to think about them ahead of time. Each query is the first query.

Karpathy's wiki is the opposite. The model reads the raw material once, understands it, and writes down the understanding. The wiki is not a copy of the sources. It is a new artifact, in a new representation, optimized for a new consumer — the next LLM that will need to answer questions against it. The understanding is done up front, once, and stored.

That's what makes it compounding. You're not re-deriving knowledge every time. You're adding to a pile of already-derived knowledge.

Why the verb matters more than the folders

If you only copy the folders, you can still get the verb wrong. And most of the GitHub clones I've looked at do get it wrong — they treat the wiki folder as a place where the user dumps notes, and the LLM occasionally tidies them up. That's not compilation. That's housekeeping.

Real compilation has three properties that the folder layout alone doesn't enforce:

It is lossy on purpose. A compiler throws away things the next stage doesn't need. Karpathy's wiki should throw away the parts of a source that don't matter to the questions you'll actually ask. If your wiki is the same length as your raw sources, you haven't compiled — you've copied.

It produces a different shape than the input. A compiled wiki page about a person is not "the article that mentioned them, lightly edited." It's a structured entity page that pulls from every source that ever mentioned them, resolves contradictions, and links to related entities. The output has a shape the inputs don't have. That shape is the value.

It is invalidated by new input. When you add a new raw source, the wiki pages that touch the same topic need to be re-compiled, not appended to. This is the part most clones skip entirely. They treat ingestion as "add a new note." Real compilation treats ingestion as "rebuild the affected entity pages." Without that, your wiki drifts, contradicts itself, and slowly becomes a mess that the LLM can't reason over.

If your setup doesn't do those three things, you have folders. You don't have a compiler.

What this means for where the wiki should live

Once you take the verb seriously, a question follows that the folder-on-laptop crowd hasn't really answered yet: who runs the compiler?

In Karpathy's setup, the answer is "you, manually, in a Claude Code session, when you remember to." That works for him. He is Andrej Karpathy. He has the discipline and the technical comfort to sit down, open an agent, and tell it to re-compile the relevant pages every time he adds something.

For almost everyone else, that's the step where the system dies. Not because people are lazy, but because the moment compilation depends on a human remembering to invoke it, you're back to the same failure mode every personal knowledge tool has ever had: a graveyard of half-organized notes that nobody keeps current.

The fix isn't a better UI for invoking the compiler. The fix is making the compiler something that any agent can invoke, from anywhere, as a side effect of the conversation that's already happening.

That's what an MCP-accessible knowledge layer does. When you finish a research conversation in Claude.ai, the agent can write back to your knowledge base before the conversation ends — not because you asked it to, but because it's part of what "having a useful conversation" means now. When you ask a question in a different agent tomorrow, that agent can read what yesterday's agent wrote. The compilation happens continuously, as a background property of using AI at all, instead of as a chore you schedule.

That only works if the knowledge layer is reachable by URL, has a stable write API, and lives somewhere that isn't your laptop. Files in a folder cannot do this. A local Obsidian vault cannot do this. The compiler has nowhere to run when the user is on their phone, and nothing to compile against when the agent is running in someone else's data center.

The honest version of the pitch

I'll be direct about why I'm writing this. KnowMine is the thing I've been building for the last six months, and it is, structurally, a hosted compiler for personal knowledge. pgvector underneath, MCP on top, agents writing to it during conversations instead of after them. Karpathy's gist is the clearest external articulation of the thesis I've been operating under, and the verb he chose is the part I want more people to take seriously.

But I don't actually need you to use KnowMine to take the point. The point is this: if you're setting up an LLM Wiki this week because of the gist, ask yourself which part of your setup is actually doing compilation, and which part is just storing files in a folder with a fancy name. If the answer is "the compilation happens when I remember to run it," you have built a folder. If the answer is "the compilation happens every time an agent touches the system," you have built infrastructure.

Karpathy used the right word. Most of the people copying him are not yet building the thing the word describes.

Start building your AI-native knowledge base

Free to start. Connect to Claude, ChatGPT, and more.

Get Started Free
The Word Karpathy Used Was 'Compile' — And That Changes Everything - KnowMine Blog