llm-wiki
a claude code / codex / opencode workflow that turns any llm agent into a research engine. builds append-only markdown knowledge bases through parallel multi-agent investigation, source ingestion, cross-referenced article compilation, and truth-seeking audits. zero runtime dependencies.
Last updated
Workflow Tool1. why llm-wiki exists
asking a frontier model directly works for one-shot questions. it stops working when you need the same answer next week with the same evidence behind it, when the question takes more than one context window of source material to answer, or when you want the answer audited against the actual sources rather than the model's training memory.
the standard alternatives both have failure modes:
- plain rag retrieves chunks per query. great at scale, but bad retrieval is worse than no retrieval — google research found gemma's incorrect-answer rate jumping from 10.2% (no context) to 66.1% (insufficient context). the model loses the ability to abstain.
- raw long-context dumps everything into the prompt. chroma research tested 18 frontier models in 2025: 20–50% accuracy drop between 10k and 100k tokens on simple retrieval. effective reliable capacity is roughly 60–70% of advertised max.
llm-wiki is the karpathy-style middle ground: an llm-compiled, append-only knowledge base. raw sources go in once and are never edited. the llm synthesizes them into compiled articles with cross-references. you query the compiled articles, which fit comfortably in context. updates re-compile only what changed.
2. install
llm-wiki ships as a claude code plugin, a native openai codex plugin, an opencode instruction file, and a portable AGENTS.md for any other agent that can read it. mit license, zero runtime dependencies — it runs entirely on the host agent's built-in tools (read, write, edit, glob, grep, bash, webfetch, websearch).
option a: as a claude code plugin
claude plugin install wiki@llm-wiki
that installs the native plugin, the /wiki router, and the full command set including /wiki:audit and /wiki:librarian. the skill auto-activates when it detects a wiki directory (~/wiki/, project-local .wiki/, or a configured hub path).
option b: as a native codex plugin
codex plugin marketplace add nvk/llm-wiki
# then open /plugins, enable "LLM Wiki", and use @wiki
codex gets a generated first-class mirror under plugins/llm-wiki/. same audit/research/query workflow, codex-native packaging. for a local checkout, use ./scripts/bootstrap-codex-plugin.sh --scope user --verify.
option c: as an opencode instruction file
{
"instructions": [
"path/to/llm-wiki/plugins/llm-wiki-opencode/skills/wiki-manager/SKILL.md"
]
}
opencode loads the generated instruction file directly. web research requires OPENCODE_ENABLE_EXA=1.
option d: as portable agents.md (any other agent)
git clone https://github.com/nvk/llm-wiki.git
# copy the bundled AGENTS.md into your project root,
# or symlink it into your agent's global config:
ln -s "$(pwd)/llm-wiki/AGENTS.md" ~/.config/opencode/AGENTS.md
the agents.md encodes the full wiki protocol — architecture, file formats, every operation as prose recipes. you trade the native command surface for a tell-the-agent-by-name workflow ("run the wiki ingest protocol on this url").
first wiki
/wiki init
creates a hub at ~/wiki/ with a wikis.json registry, a _index.md, and an empty topics/ directory. each topic wiki you spin up afterwards lives under topics/<name>/ with its own raw, wiki, inbox, output subdirectories.
~/.config/llm-wiki/config.json. point it at any synced folder if you want the wiki replicated across machines.
3. data model
llm-wiki uses karpathy's three-layer separation. once you internalize this split, every command makes sense.
| Layer | Path | Contents | Mutability |
|---|---|---|---|
| raw sources | raw/ | ingested material: articles, papers, repos, notes, transcripts, data dumps | immutable — never edited after ingestion |
| compiled wiki | wiki/ | llm-synthesized articles: concepts, topics, references, theses | mutable — re-written by the llm during compile |
| schema | config.md, references/ | scope, conventions, agent prompts, compilation rules | configured by you, read by the llm |
hub + topic isolation
nvk/llm-wiki extends the flat pattern with a hub registry. each topic gets its own full wiki — independent research tracks that don't pollute each other.
~/wiki/ # hub — lightweight registry
├── wikis.json # registry of all topic wikis
├── _index.md # global stats and navigation
├── log.md # global activity log
└── topics/
├── quantum-computing/ # full wiki: raw/, wiki/, inbox/, output/
├── hardware-wallet-security/
└── meta-llm-wiki/ # the wiki about llm-wiki itself
each topic wiki ships with an .obsidian/ config so it works as an obsidian vault out of the box. the dual-link format ([[wikilinks]] + [text](path.md)) keeps both obsidian and the agent happy.
append-only sources, mutable wiki
the immutability of raw/ is load-bearing. it means every claim in a compiled article can be traced back to the source it came from, and re-compilation never loses material. if a source turns out to be wrong, you retract it (/wiki:retract) — the agent identifies every article that referenced it, cleans up references, and flags affected claims for re-verification.
4. research pipeline
the core loop is ingest → compile → query. for a brand-new topic you typically start with /wiki:research, which runs ingest and compile end-to-end across multiple parallel agents.
parallel multi-agent research
each research mode spawns a different number of agents with different angles:
| Mode | Agents | Strategy |
|---|---|---|
| standard | 5 | academic, technical, applied, news/trends, contrarian |
| deep | 8 | + historical, adjacent, data/stats |
| retardmax | 10 | + 2 rabbit-hole agents, skip planning, ingest aggressively |
the orchestrator-worker pattern is deliberate: each agent has independent search scope with no inter-agent coordination. the multi-agent failure-mode literature (mast taxonomy, 14 modes across 3 categories) flags inter-agent misalignment as the most common failure type. independence avoids it.
commands
| Command | What it does |
|---|---|
/wiki:research | full research run — parallel agents discover sources, ingest them, compile articles |
/wiki:ingest | ingest a specific url, file, or freeform text into raw/ |
/wiki:compile | (re-)compile raw/ into wiki/ articles with cross-references |
/wiki:query | ask questions against compiled articles only — answers cite source articles |
/wiki:audit | umbrella trust audit: recheck the wiki, trace outputs and provenance, and do fresh research if the local corpus is not enough |
/wiki:librarian | focused wiki maintenance: score articles for staleness and quality, then report what needs attention |
/wiki:lint | structural health: broken links, missing indexes, stale articles, duplicates |
/wiki:assess | compare a local repo or topic against the wiki — gap analysis |
/wiki:output | generate reports, slide decks, study guides, timelines, glossaries, and other deliverables from the wiki |
/wiki:plan | turn the wiki into an implementation plan grounded in the current knowledge base |
/wiki:retract | remove a source; recompile affected articles; flag downstream claims |
/wiki init | create the hub or a new topic wiki |
compilation details
compile takes the messy raw layer and produces typed articles in wiki/concepts/, wiki/topics/, and wiki/references/. entity dedup uses a 0.7 cosine threshold; conflict resolution uses an llm-debate pattern when two sources contradict; cross-references are added inline as both wikilinks and markdown links so the result reads cleanly in obsidian and in claude code.
5. thesis mode
/wiki:research --mode thesis reframes the research as a claim with explicit for/against evidence. the agent runs two passes:
- round 1: standard parallel search, ingesting both supporting and opposing sources.
- round 2: the orchestrator scores the evidence so far and targets the weaker side. this is the anti-confirmation-bias step — if round 1 turned up 12 sources for and 3 against, round 2 hunts for the missing against case before drawing a conclusion.
the compiled thesis article structures the result as: claim → evidence for (with citations) → evidence against (with citations) → verdict (which side wins, by what margin, and what would change it). use it when you want a research output you can actually defend, not just a confident-sounding summary.
6. outputs
/wiki:output <type> generates artifacts from the compiled wiki and files them under output/. seven types ship by default:
| Type | What you get |
|---|---|
summary | condensed overview, 1–2 pages, key takeaways |
report | analytical report with sections, evidence, conclusions, 3–5 pages |
study-guide | concepts + definitions + q&a, designed for learning |
slides | marp-compatible markdown deck, ----separated slides |
timeline | chronological view of developments / events / milestones |
glossary | alphabetized definitions of all key terms |
comparison | structured side-by-side, table-driven, for 2+ concepts |
two flags worth knowing:
--with <wiki>loads a supplementary wiki for craft knowledge. the primary wiki provides the subject;--withprovides the technique. example:--wiki quantum-computing --with article-writinguses quantum-computing for content and article-writing for structure, hooks, and writing patterns.--retardmaxships it now. read all articles, skip planning, generate immediately. better imperfect-now than perfect-never. iterate with a non-retardmax pass once you have something on disk.
7. claude code, codex, opencode
the underlying tool surface (read, write, edit, glob, grep, bash, webfetch, websearch) is identical across the three runtimes, so the wiki protocol itself doesn't change. only the packaging differs.
claude code
install the plugin with claude plugin install wiki@llm-wiki. claude gets the full native command surface, including the fuzzy /wiki router, /wiki:audit for truth-seeking inspection, and /wiki:librarian for wiki-only maintenance.
openai codex
codex now has a native plugin path too: codex plugin marketplace add nvk/llm-wiki, then enable "LLM Wiki" in /plugins and invoke it with @wiki. the mirror is generated from the claude source of truth, so codex gets the same audit/research/query behavior without a workflow fork.
AGENTS.md is still the portable fallback when you want zero install ceremony or need another agent entirely. the architectural pattern behind the codex mirror is described in claude-to-codex-plugins.
opencode
the practical path is the generated instruction file under plugins/llm-wiki-opencode/skills/wiki-manager/SKILL.md. add it to opencode.json and opencode gets the same shared wiki workflow. web research requires OPENCODE_ENABLE_EXA=1.
other viable paths, in increasing order of polish:
- agents.md (zero effort): opencode reads agents.md natively with a documented lookup order. drop the bundled agents.md in the project root or at
~/.config/opencode/AGENTS.md. works today. - opencode-agent-skills compat layer: joshuadavidthomas/opencode-agent-skills scans
.claude/skills/and translates the claude-code plugin format on the fly. you get skill activation triggers and skill content injection without modifying llm-wiki source. - repo-native instruction mirror: the dedicated generated tree under
plugins/llm-wiki-opencode/. this is the maintained path in the repo today.
webfetch and websearch are powered by exa ai and require OPENCODE_ENABLE_EXA. without it, research operations that depend on web search won't work — ingest from local files or pre-fetched urls instead.
8. when to use it (and when not)
use llm-wiki when
- you'll ask the same kind of question more than once and want each answer audited against the same source set
- you need to decide whether an existing report or playbook is still trustworthy and want the agent to chase the truth with new research if needed
- the source material exceeds a single context window and re-uploading every time is wasteful
- you want sources to outlive any individual conversation — append-only raw, citable from articles
- your team needs to share research without sharing chat transcripts
- you need to defend a claim with explicit for/against evidence (use thesis mode)
just ask the model directly when
- one-shot, throwaway question with no follow-ups planned
- the answer is well-covered by the model's training data and verifiability isn't a priority
- the topic is volatile enough that a wiki snapshot would be stale before you re-read it
use rag instead when
- corpus is over ~100k tokens and grows continuously
- sub-second query latency is a hard requirement
- you have embedding-api budget and want per-query cost < per-context-load cost
use both (hybrid) when
you have stable curated knowledge (architecture decisions, conventions, core concepts — fits in < 30k tokens) and a large dynamic corpus (full source archive, historical data, voluminous reference). the wiki acts as the always-in-context routing layer; rag handles the periphery on demand. this is the recommended target architecture for any non-trivial production use.