Karpathy's LLM Wiki Is Right — Here's What Happens When You Make It Cloud-Native and MCP-First
KnowMine is a working implementation of the compounding knowledge base pattern — with semantic search, remote agent access, and zero setup. A deep dive into what changes when you take Karpathy's local-first vision and make it cloud-native.
The Insight That Went Viral — and Why It Matters
Andrej Karpathy's LLM Wiki gist went viral this week for good reason. The core insight — that LLMs should compile and maintain a structured knowledge base instead of rediscovering knowledge from scratch on every query — is a fundamental shift from how most people use AI today.
We've been building exactly this. KnowMine is an AI-native personal knowledge base that's been live for months, with 11 MCP tools, pgvector semantic search, and a SOUL personalization layer. When we read Karpathy's gist, we didn't think "great idea" — we thought "we already shipped this, and we solved three problems his approach can't."
This post isn't a critique of Karpathy's vision. It's a love letter to it — plus the engineering lessons from building a cloud-native, MCP-first implementation that any AI agent can use out of the box.
Where Karpathy's Pattern Hits Its Ceiling
The scaling wall. Karpathy notes that at ~100 sources and a few hundred pages, an index file is sufficient — no vector database needed. This is true. But knowledge bases grow. At 500+ entries with overlapping concepts across domains, keyword matching and index scanning break down. You need semantic similarity search. KnowMine uses pgvector with text-embedding-3-small from day one — every entry is vectorized on save, and related knowledge surfaces automatically.
The local-only trap. Karpathy's setup requires a local Agent (Claude Code, Codex) + local Obsidian. This means your knowledge base is locked to one machine, one session. KnowMine exposes the entire knowledge base as a remote MCP endpoint. Any MCP-compatible agent — Claude Code, OpenCode, custom agents — can read and write to your knowledge base from anywhere.
The human-driven bottleneck. In Karpathy's workflow, you must explicitly "ingest" sources and "query" the wiki. The LLM acts only when prompted. KnowMine's MCP tools allow AI agents to proactively save knowledge during conversations — a pattern we call "conversational knowledge capture." You're discussing a topic with Claude, and it stores the insight for you without breaking the flow.
The Architecture That Makes This Work
Three layers, same philosophy — different implementation
| Layer | Karpathy's Approach | KnowMine |
|---|---|---|
| Raw Sources | Local files in raw/ directory | Any content via MCP add_knowledge tool — text, URLs, voice transcriptions |
| Knowledge Base | LLM-written Markdown files | pgvector-indexed entries with auto-tags, auto-folders, semantic embeddings |
| Schema / Rules | CLAUDE.md / AGENTS.md | SOUL profile (AI-generated user model) + folder presets |
The MCP difference
Here's what it looks like when an AI agent stores knowledge in KnowMine during a conversation:
Tool: add_knowledge
Input: {
"content": "Karpathy's LLM Wiki pattern validates...",
"type": "insight",
"tags": ["competitive-analysis", "knowledge-management"]
}
No file system. No local Agent. No setup. The knowledge is instantly vectorized, tagged, and discoverable.
11 MCP tools — not just CRUD
add_knowledge/update_knowledge/delete_knowledge— full CRUDsearch_my_knowledge— semantic vector search across all entriesget_related_knowledge— discover hidden connections between ideassave_memory/recall_memory— persistent AI memory across sessionsget_soul— retrieve your AI-generated user profileget_insight— AI-powered analysis of your knowledge patternslist_folders— browse knowledge organization
Karpathy's Best Ideas That We're Stealing
Credit where it's due. Several concepts from the gist are genuinely brilliant and we're incorporating them:
The Lint operation. Periodic AI health checks on your knowledge base — finding contradictions, orphan entries, stale information, missing connections. KnowMine's get_insight tool has the seed of this, but a full "knowledge lint" feature is now on our roadmap.
"Good answers become new pages." This feedback loop — where your exploration compounds into permanent knowledge — is the exact pattern KnowMine's save_memory and add_knowledge tools enable during conversation. But Karpathy's framing of it as "knowledge compound interest" is perfect.
The Schema layer as user-controlled rules. Letting users define how their knowledge should be organized is powerful. This aligns with our planned folder presets feature.
Who Should Use What
Use Karpathy's approach if:
- You're a developer comfortable with Claude Code or Codex
- You want full local control over your data
- You enjoy the Obsidian graph view and plugin ecosystem
- Your knowledge base is topic-specific and moderate in size (~100 sources)
Use KnowMine if:
- You want your AI agents to read/write your knowledge base remotely
- You need semantic search across a growing knowledge base
- You want zero-setup — no local Agent, no file management
- You work across multiple AI tools and need a shared knowledge layer
- You want AI to proactively capture knowledge during conversations
Use both: There's nothing stopping you from using KnowMine as your "compiled knowledge layer" while keeping raw sources in Obsidian locally. MCP is an interoperability standard — that's the point.
Try It in 5 Minutes
- Get your MCP key at knowmine.ai
- Add the MCP endpoint to your Agent's configuration:
knowmine.ai/api/mcp?key=YOUR_KEY - Tell your Agent: "Save this insight to my knowledge base" — and it just works.
- Search semantically: "What do I know about knowledge management?" — and related entries surface by meaning, not keywords.
The Memex Promise, Finally Delivered
Karpathy quoted Vannevar Bush's 1945 Memex vision — a personal knowledge store where the connections between documents are as valuable as the documents themselves. Bush couldn't solve who would maintain it. Karpathy says LLMs solve that. We agree — and we think the next step is making that maintained knowledge base accessible to every AI agent you use, from anywhere, through a standard protocol.
That's what MCP-first means. Your knowledge shouldn't be locked in one tool or one machine. It should be a service that any AI can call.
Start building your AI-native knowledge base
Free to start. Connect to Claude, ChatGPT, and more.
Get Started Free