Karpathy's LLM Wiki, Two Weeks Later: Right About Bookkeeping, Brittle Past Personal Scale
Two weeks after the ceiling post, here's what the ecosystem actually built — qmd (Karpathy's own patch), rohitg00's hybrid v2, Graphify's graph approach — and the three places the LLM Wiki pattern reliably breaks once you outgrow personal scale.
Two weeks ago I wrote about the ceiling in Karpathy's LLM Wiki pattern. The post argued the schema layer is a programming task, agents don't live on laptops anymore, and lint gets impossible past a few hundred pages.
I want to take a sharper second pass — partly because I owe a few corrections, partly because the ecosystem has built things in those two weeks that say more than my essay could.
If you read the first post, skip the recap. If you didn't, the one-line version is: the LLM Wiki pattern is correct about the bottleneck, and underspecified about everything past personal scale. This post is the receipts.
What Karpathy got right (and I want to keep saying it)
The bottleneck in any long-running knowledge base is not retrieval. It's bookkeeping.
Cross-references rot. Summaries diverge from sources. Concept pages duplicate. The reason most personal wikis die isn't that nobody could find anything — it's that nobody was willing to maintain them. Karpathy named this clearly: the LLM is the only entity patient enough, careful enough, and cheap enough at the margin to do the bookkeeping.
The killer move he named is file-back: when a query produces something valuable, write it back into the wiki as a new page. Conversations stop being one-shot — they generate context instead of just consuming it.
This is right. I haven't seen anyone seriously contest it. Every credible response in the last two weeks has accepted this premise and argued only about implementation.
The sweet spot — and the patch Karpathy himself shipped
I want to correct something I implied last time.
The original gist describes a setup that "works surprisingly well at moderate scale" — about 100 sources, a few hundred pages of compiled wiki. I sometimes wrote that as if Karpathy had drawn a hard ceiling. He didn't. It's an empirical sweet spot, not an architectural commitment.
More importantly: in the gist's "Optional: CLI tools" section, Karpathy explicitly recommends qmd — a CLI tool that does BM25 + vector search + LLM rerank, with an MCP server interface. He recommends it for the moment your wiki outgrows what index.md plus a grep can handle.
This matters because a lot of takes — including some of my own framing — have positioned LLM Wiki as a "files vs. RAG" debate. That's a misread. Karpathy is not anti-vector. He's anti-premature-RAG-infrastructure for personal-scale work, and he says so cleanly: when the wiki gets bigger, you want a proper search layer.
The interesting question — the one his gist doesn't answer — is what that proper search layer should look like once "personal scale" turns into something else. Multiple devices. Multiple agents. A team. An app that needs your knowledge accessible from a hosted API. That's where the last two weeks of community work has converged.
Where the pattern breaks past personal scale: three pieces of evidence
I won't argue this from first principles. The community has done the work, and the converging direction is hard to miss.
1. rohitg00's LLM Wiki v2 — hybrid retrieval is necessary, not optional
Rohit's LLM Wiki v2 — published explicitly as "extending Karpathy's pattern with lessons from building agentmemory" — runs BM25 + vector search + a 582-node knowledge graph, fused with Reciprocal Rank Fusion. Confidence scoring on every node. On LongMemEval-S, the long-horizon memory benchmark, it scores 95.2%.
The reason he gives for the architecture is direct: flat index.md lookup degrades sharply past 200–500 documents. Not because of token limits — because attention dilutes when the LLM has to make relevance judgments over a thousand candidate items, and you can't tell when the dilution started costing you.
This is the single most important data point about the LLM Wiki ceiling: there's a benchmarked, working alternative that ships with hybrid retrieval as a base assumption.
2. gulliveruk's principle — scoping should be deterministic, reasoning probabilistic
Of all the responses in the gist comments, the line that's been quoted the most is gulliveruk's:
Scoping should be deterministic. Reasoning should be probabilistic.
What this means in practice: when an agent is figuring out which knowledge is relevant to a query, that's a set operation — you want a deterministic prefilter (vector similarity, tag match, graph edge) to narrow the candidate set. You do not want to spend tokens having the LLM scan a thousand-line index and silently drop things from attention.
The LLM is the right tool for reasoning over the prefiltered set. It is the wrong tool for being the prefilter.
This is a one-sentence diagnosis of why pure-index.md patterns get fragile fast: they make the LLM do set operations that should have been a query.
3. Graphify — the same ceiling, attacked from a different angle
safishamsi/graphify — 2,000+ stars in 48 hours — takes Karpathy's pattern and replaces the entire index.md + LLM-rerank approach with graph topology. Code parsed via local tree-sitter (zero tokens), non-code chunked by parallel LLM sub-agents, then clustering done with the Leiden community detection algorithm on edge density. SHA-256 caching, file-watcher mode, Git hook integration. One command: /graphify .
The reason Graphify exists, and the reason it caught fire that fast, is the same reason rohitg00's hybrid retrieval exists: the original index.md + grep workflow runs out of room. Graphify chose to skip embeddings entirely and lean on graph topology; rohitg00 chose to combine both. The point isn't which is better. The point is that two of the most-starred independent extensions of the pattern in the same week both added structural retrieval the original didn't have.
That's the closest thing to convergence the high-quality technical responses have produced.
Productize and extend, not "we beat Karpathy"
I'm building KnowMine, and I want to be precise about what it is in this conversation.
KnowMine is not "a better LLM Wiki." Karpathy's gist is, by his own framing, an idea file — designed to be copy-pasted into any coding agent so the agent can instantiate it. It is deliberately abstract. It is not a product. KnowMine is the same insight, productized and extended.
What productize means concretely:
- The schema layer ships pre-built. You don't write
AGENTS.mdfrom a blank file. The 19-tool MCP surface is the schema. Title extraction, tag inference, folder placement happen on everyadd_knowledgecall — without a config you wrote. - Retrieval is hybrid from entry one — semantic prefilter, typed links, revision history — not bolted on at the 500-document threshold.
- The interface is MCP, not the local filesystem. Any MCP-compatible runtime — Claude Code, ChatGPT, Cursor, OpenCode, scheduled agents on a VPS — reads and writes the same knowledge base. Three permission templates (read-only / read-write / full) so when your agent is running somewhere you don't fully trust, you can constrain what it touches.
What extend means concretely:
- file-back is a tool call, not a manual step.
add_knowledgewrites back the moment the LLM decides something is worth keeping. There's no "drop the file then run ingest" loop. - Typed links are explicit, not parsed from
[[wikilinks]].add_knowledge_linklets the LLM declare why two entries connect —based_on,refutes,extends,evolved_from,related_to. The graph is intentional, not accidental. - Revision history is built in. Every meaningful update is versioned via
get_knowledge_history. The "lint" question — what's stale, what's contradicted, how a decision evolved — has a substrate to run on. - Memory is separated from Knowledge.
add_knowledgestores documents, articles, ideas.save_memory/recall_memorystore decisions, lessons, preferences, domain insights. Most personal wikis blur this and pay for it later — the model can't tell whether a page is a fact about the world or a stance you've taken. KnowMine forces the distinction at the tool layer. - Soul. The same surface includes
generate_soul/get_soul— an AI-generated profile built from accumulated memory and knowledge, refreshable on demand. When you switch agents, the new agent reads your Soul and behaves like it knows you. Karpathy's wiki gives the agent your knowledge. KnowMine also gives it a model of you.
The honest version of KnowMine's positioning is this: it moves the bookkeeping bottleneck from the LLM context window to an engineerable retrieval and data layer. That's not "no bottleneck." That's "a bottleneck we know how to instrument, scale, and pay for."
A migration path, with a benchmark to come
If you've already invested time in a Karpathy-style wiki and you've outgrown the personal-scale sweet spot, you shouldn't have to throw it away.
I'm shipping a Karpathy-pattern → KnowMine migration skill as a follow-up to this post. It takes a local wiki directory (the standard raw/ + wiki/ + index.md shape), batch-writes every page through add_knowledge, parses any [[wikilinks]] into typed links via add_knowledge_link, and gives you back a workspace that's reachable from any MCP runtime.
To make the comparison concrete instead of marketing, the skill release will be paired with a benchmark on a public dataset (≈100 recent ML papers from arXiv, ingested under a Karpathy-style schema). Three configurations on the same 20 questions:
index.md+ LLM navigation (Karpathy native)- BM25 + vector (qmd or equivalent)
- KnowMine MCP (semantic prefilter + typed links + revision history)
Four metrics: recall (was the relevant knowledge surfaced), citation accuracy (did the answer point to the right source), cost (tokens + API calls + wall time), and human correction effort (how much editing before the answer is usable).
Numbers go in the migration skill release post — not this one. I'd rather publish methodology now and results when they're real, than ship plausible-sounding claims with no receipts.
Honest boundaries
If you're at personal-research scale, working with one coding agent on one laptop, and Karpathy's pattern works for you — stay. It is the right answer for that scenario. The gist is excellent and the qmd recommendation handles most of the growth you'll see.
KnowMine is for the case where one of three things changes:
- Scale. Past a few hundred entries you want a deterministic prefilter. The LLM should be reasoning, not scoping.
- Runtime. Your agent isn't only Claude Code on your laptop — it's also ChatGPT on your phone, a Slack bot, a scheduled job on a VPS. Files don't reach those places. Protocols do.
- Schema ownership. You don't want to write
AGENTS.mdfrom scratch. You want sensible defaults and the option to tune them later.
If none of those have changed for you, the LLM Wiki pattern is doing exactly what it was designed for. That's not a sales line — that's where I'd start too.
Close
Karpathy's LLM Wiki is right about the only thing that actually matters: the bookkeeping is the work, and the LLM is the only entity that will do it patiently. Everything else is implementation choice.
Two weeks of community work has made the implementation choices clearer. Hybrid retrieval is not optional past a few hundred entries. Set operations belong in a query layer, not in LLM attention. Graph structure is a real alternative to flat indexing. And the pattern outgrows the laptop the moment your agent does.
KnowMine is one set of those implementation choices, made consistently and shipped as a product. Same compounding knowledge. Same file-back loop. Without the local lock-in.
If you've been running into the ceiling, the migration skill drops next week. Benchmark numbers come with it.
Start building your AI-native knowledge base
Free to start. Connect to Claude, ChatGPT, and more.
Get Started Free