llm-wiki

a claude code / codex / opencode workflow that turns any llm agent into a research engine. builds append-only markdown knowledge bases through parallel multi-agent investigation, source ingestion, cross-referenced article compilation, and truth-seeking audits. zero runtime dependencies.

Last updated

Workflow Tool
further reading: the project site at llm-wiki.net covers the philosophy and design rationale. source on github: nvk/llm-wiki.
porting pattern: if you want the architectural write-up for taking a Claude-first plugin and making Codex first-class alongside it, read claude-to-codex-plugins. This guide covers the tool itself; that guide covers the migration pattern.

1. why llm-wiki exists

asking a frontier model directly works for one-shot questions. it stops working when you need the same answer next week with the same evidence behind it, when the question takes more than one context window of source material to answer, or when you want the answer audited against the actual sources rather than the model's training memory.

the standard alternatives both have failure modes:

  • plain rag retrieves chunks per query. great at scale, but bad retrieval is worse than no retrieval — google research found gemma's incorrect-answer rate jumping from 10.2% (no context) to 66.1% (insufficient context). the model loses the ability to abstain.
  • raw long-context dumps everything into the prompt. chroma research tested 18 frontier models in 2025: 20–50% accuracy drop between 10k and 100k tokens on simple retrieval. effective reliable capacity is roughly 60–70% of advertised max.

llm-wiki is the karpathy-style middle ground: an llm-compiled, append-only knowledge base. raw sources go in once and are never edited. the llm synthesizes them into compiled articles with cross-references. you query the compiled articles, which fit comfortably in context. updates re-compile only what changed.

karpathy's insight: knowledge bases historically failed because of maintenance burden, not reading difficulty. llms remove the burden — they "don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass."

2. install

llm-wiki ships as a claude code plugin, a native openai codex plugin, an opencode instruction file, and a portable AGENTS.md for any other agent that can read it. mit license, zero runtime dependencies — it runs entirely on the host agent's built-in tools (read, write, edit, glob, grep, bash, webfetch, websearch).

option a: as a claude code plugin

claude plugin install wiki@llm-wiki

that installs the native plugin, the /wiki router, and the full command set including /wiki:audit and /wiki:librarian. the skill auto-activates when it detects a wiki directory (~/wiki/, project-local .wiki/, or a configured hub path).

option b: as a native codex plugin

codex plugin marketplace add nvk/llm-wiki
# then open /plugins, enable "LLM Wiki", and use @wiki

codex gets a generated first-class mirror under plugins/llm-wiki/. same audit/research/query workflow, codex-native packaging. for a local checkout, use ./scripts/bootstrap-codex-plugin.sh --scope user --verify.

option c: as an opencode instruction file

{
  "instructions": [
    "path/to/llm-wiki/plugins/llm-wiki-opencode/skills/wiki-manager/SKILL.md"
  ]
}

opencode loads the generated instruction file directly. web research requires OPENCODE_ENABLE_EXA=1.

option d: as portable agents.md (any other agent)

git clone https://github.com/nvk/llm-wiki.git
# copy the bundled AGENTS.md into your project root,
# or symlink it into your agent's global config:
ln -s "$(pwd)/llm-wiki/AGENTS.md" ~/.config/opencode/AGENTS.md

the agents.md encodes the full wiki protocol — architecture, file formats, every operation as prose recipes. you trade the native command surface for a tell-the-agent-by-name workflow ("run the wiki ingest protocol on this url").

first wiki

/wiki init

creates a hub at ~/wiki/ with a wikis.json registry, a _index.md, and an empty topics/ directory. each topic wiki you spin up afterwards lives under topics/<name>/ with its own raw, wiki, inbox, output subdirectories.

icloud / dropbox: the hub path is configurable via ~/.config/llm-wiki/config.json. point it at any synced folder if you want the wiki replicated across machines.

3. data model

llm-wiki uses karpathy's three-layer separation. once you internalize this split, every command makes sense.

LayerPathContentsMutability
raw sourcesraw/ingested material: articles, papers, repos, notes, transcripts, data dumpsimmutable — never edited after ingestion
compiled wikiwiki/llm-synthesized articles: concepts, topics, references, thesesmutable — re-written by the llm during compile
schemaconfig.md, references/scope, conventions, agent prompts, compilation rulesconfigured by you, read by the llm

hub + topic isolation

nvk/llm-wiki extends the flat pattern with a hub registry. each topic gets its own full wiki — independent research tracks that don't pollute each other.

~/wiki/                          # hub — lightweight registry
├── wikis.json                   # registry of all topic wikis
├── _index.md                    # global stats and navigation
├── log.md                       # global activity log
└── topics/
    ├── quantum-computing/       # full wiki: raw/, wiki/, inbox/, output/
    ├── hardware-wallet-security/
    └── meta-llm-wiki/           # the wiki about llm-wiki itself

each topic wiki ships with an .obsidian/ config so it works as an obsidian vault out of the box. the dual-link format ([[wikilinks]] + [text](path.md)) keeps both obsidian and the agent happy.

append-only sources, mutable wiki

the immutability of raw/ is load-bearing. it means every claim in a compiled article can be traced back to the source it came from, and re-compilation never loses material. if a source turns out to be wrong, you retract it (/wiki:retract) — the agent identifies every article that referenced it, cleans up references, and flags affected claims for re-verification.

4. research pipeline

the core loop is ingest → compile → query. for a brand-new topic you typically start with /wiki:research, which runs ingest and compile end-to-end across multiple parallel agents.

parallel multi-agent research

each research mode spawns a different number of agents with different angles:

ModeAgentsStrategy
standard5academic, technical, applied, news/trends, contrarian
deep8+ historical, adjacent, data/stats
retardmax10+ 2 rabbit-hole agents, skip planning, ingest aggressively

the orchestrator-worker pattern is deliberate: each agent has independent search scope with no inter-agent coordination. the multi-agent failure-mode literature (mast taxonomy, 14 modes across 3 categories) flags inter-agent misalignment as the most common failure type. independence avoids it.

commands

CommandWhat it does
/wiki:researchfull research run — parallel agents discover sources, ingest them, compile articles
/wiki:ingestingest a specific url, file, or freeform text into raw/
/wiki:compile(re-)compile raw/ into wiki/ articles with cross-references
/wiki:queryask questions against compiled articles only — answers cite source articles
/wiki:auditumbrella trust audit: recheck the wiki, trace outputs and provenance, and do fresh research if the local corpus is not enough
/wiki:librarianfocused wiki maintenance: score articles for staleness and quality, then report what needs attention
/wiki:lintstructural health: broken links, missing indexes, stale articles, duplicates
/wiki:assesscompare a local repo or topic against the wiki — gap analysis
/wiki:outputgenerate reports, slide decks, study guides, timelines, glossaries, and other deliverables from the wiki
/wiki:planturn the wiki into an implementation plan grounded in the current knowledge base
/wiki:retractremove a source; recompile affected articles; flag downstream claims
/wiki initcreate the hub or a new topic wiki

compilation details

compile takes the messy raw layer and produces typed articles in wiki/concepts/, wiki/topics/, and wiki/references/. entity dedup uses a 0.7 cosine threshold; conflict resolution uses an llm-debate pattern when two sources contradict; cross-references are added inline as both wikilinks and markdown links so the result reads cleanly in obsidian and in claude code.

incremental compile: compile is manifest-based. only changed sources trigger re-synthesis of their downstream articles, not the whole wiki. ivm theory says incremental view maintenance can be ~10⁷× faster than full recompilation; the practical speedup is whatever fraction of the corpus changed.

5. thesis mode

/wiki:research --mode thesis reframes the research as a claim with explicit for/against evidence. the agent runs two passes:

  1. round 1: standard parallel search, ingesting both supporting and opposing sources.
  2. round 2: the orchestrator scores the evidence so far and targets the weaker side. this is the anti-confirmation-bias step — if round 1 turned up 12 sources for and 3 against, round 2 hunts for the missing against case before drawing a conclusion.

the compiled thesis article structures the result as: claimevidence for (with citations) → evidence against (with citations) → verdict (which side wins, by what margin, and what would change it). use it when you want a research output you can actually defend, not just a confident-sounding summary.

6. outputs

/wiki:output <type> generates artifacts from the compiled wiki and files them under output/. seven types ship by default:

TypeWhat you get
summarycondensed overview, 1–2 pages, key takeaways
reportanalytical report with sections, evidence, conclusions, 3–5 pages
study-guideconcepts + definitions + q&a, designed for learning
slidesmarp-compatible markdown deck, ----separated slides
timelinechronological view of developments / events / milestones
glossaryalphabetized definitions of all key terms
comparisonstructured side-by-side, table-driven, for 2+ concepts

two flags worth knowing:

  • --with <wiki> loads a supplementary wiki for craft knowledge. the primary wiki provides the subject; --with provides the technique. example: --wiki quantum-computing --with article-writing uses quantum-computing for content and article-writing for structure, hooks, and writing patterns.
  • --retardmax ships it now. read all articles, skip planning, generate immediately. better imperfect-now than perfect-never. iterate with a non-retardmax pass once you have something on disk.

7. claude code, codex, opencode

the underlying tool surface (read, write, edit, glob, grep, bash, webfetch, websearch) is identical across the three runtimes, so the wiki protocol itself doesn't change. only the packaging differs.

claude code

install the plugin with claude plugin install wiki@llm-wiki. claude gets the full native command surface, including the fuzzy /wiki router, /wiki:audit for truth-seeking inspection, and /wiki:librarian for wiki-only maintenance.

openai codex

codex now has a native plugin path too: codex plugin marketplace add nvk/llm-wiki, then enable "LLM Wiki" in /plugins and invoke it with @wiki. the mirror is generated from the claude source of truth, so codex gets the same audit/research/query behavior without a workflow fork.

AGENTS.md is still the portable fallback when you want zero install ceremony or need another agent entirely. the architectural pattern behind the codex mirror is described in claude-to-codex-plugins.

opencode

the practical path is the generated instruction file under plugins/llm-wiki-opencode/skills/wiki-manager/SKILL.md. add it to opencode.json and opencode gets the same shared wiki workflow. web research requires OPENCODE_ENABLE_EXA=1.

other viable paths, in increasing order of polish:

  1. agents.md (zero effort): opencode reads agents.md natively with a documented lookup order. drop the bundled agents.md in the project root or at ~/.config/opencode/AGENTS.md. works today.
  2. opencode-agent-skills compat layer: joshuadavidthomas/opencode-agent-skills scans .claude/skills/ and translates the claude-code plugin format on the fly. you get skill activation triggers and skill content injection without modifying llm-wiki source.
  3. repo-native instruction mirror: the dedicated generated tree under plugins/llm-wiki-opencode/. this is the maintained path in the repo today.
web tools: opencode's webfetch and websearch are powered by exa ai and require OPENCODE_ENABLE_EXA. without it, research operations that depend on web search won't work — ingest from local files or pre-fetched urls instead.

8. when to use it (and when not)

use llm-wiki when

  • you'll ask the same kind of question more than once and want each answer audited against the same source set
  • you need to decide whether an existing report or playbook is still trustworthy and want the agent to chase the truth with new research if needed
  • the source material exceeds a single context window and re-uploading every time is wasteful
  • you want sources to outlive any individual conversation — append-only raw, citable from articles
  • your team needs to share research without sharing chat transcripts
  • you need to defend a claim with explicit for/against evidence (use thesis mode)

just ask the model directly when

  • one-shot, throwaway question with no follow-ups planned
  • the answer is well-covered by the model's training data and verifiability isn't a priority
  • the topic is volatile enough that a wiki snapshot would be stale before you re-read it

use rag instead when

  • corpus is over ~100k tokens and grows continuously
  • sub-second query latency is a hard requirement
  • you have embedding-api budget and want per-query cost < per-context-load cost

use both (hybrid) when

you have stable curated knowledge (architecture decisions, conventions, core concepts — fits in < 30k tokens) and a large dynamic corpus (full source archive, historical data, voluminous reference). the wiki acts as the always-in-context routing layer; rag handles the periphery on demand. this is the recommended target architecture for any non-trivial production use.