Context and Multi-Agent

Managing context windows, sessions, and parallel agent workflows.

Last updated

Workflow

1. The Compaction Pipeline

Claude Code uses a 5-layer compaction pipeline before every model call. Each layer is cheaper than the next, so earlier layers run first and semantic compression is the last resort.

LayerMechanismCost
1. Budget reductionTrim tool output to fit budgetFree
2. SnipDrop old, low-value turnsFree
3. MicrocompactRemove whitespace and formatting noiseMinimal
4. Context collapseMerge adjacent tool resultsLow
5. Auto-compactSemantic summarization by the modelHigh

Use /compact with custom instructions to preserve critical context when the window fills up. For example: /compact keep the database schema and test plan.

Context hygiene: Use /clear between unrelated tasks. Context from Task A pollutes Task B — the model wastes tokens attending to irrelevant history and may hallucinate connections that don't exist.

Sessions are append-only JSONL files. Nothing is ever deleted from the file on disk; compaction only affects what gets sent to the model in the next request.

2. Session Management

Resume sessions with claude -c (latest session) or claude -r (interactive picker). Session-scoped permissions are NOT restored on resume — this is intentional, preventing approval grandfathering where a permission granted for one task carries over to a different context.

Forking sessions

Fork a session to branch from a specific point. The forked session gets a copy of the conversation up to that point but diverges independently from there.

The Ralph Loop

A pattern for sustained multi-task sessions without context degradation:

  1. Pick task from the backlog
  2. Implement the change
  3. Validate (tests, lint, manual check)
  4. Commit if passing
  5. Reset context (/clear)
  6. Repeat

Memory persists through git history and progress logs (like tasks/todo.md), not the context window. The commit message and file diffs carry the knowledge forward; the agent re-reads them when it picks up the next task.

3. Subagent Delegation

Subagents run in isolated context windows and return only summaries to the parent. This prevents "subagent conversation inflation" — where a parent agent's context fills up with the full transcript of every delegated task instead of just the result.

When to use subagents

  • Research and documentation lookup
  • Code exploration across unfamiliar areas
  • Parallel analysis of independent subsystems

Coordination tiers

TierAgentsCommunicationUse case
Subagents2-3In-process, parent-childResearch, exploration
Agent teams3-5Local, peer messaging via filesFeature implementation
Cloud orchestratorsNAsync, assign-and-walk-awayBulk migrations, audits
Rule of thumb: Three focused agents consistently outperform one generalist working three times as long. The constraint of a narrow context window forces each agent to stay on-task.

4. Git Worktree Isolation

One task, one branch, one worktree, one agent. Create a worktree with:

git worktree add ../fix-auth-bug fix-auth-bug

Each worktree gets its own filesystem checkout, so agents editing files in parallel never produce merge conflicts at the file level.

Practical limits

5-7 concurrent agents is the practical ceiling. Beyond that, API rate limits and human review overhead cancel out the throughput gains.

Worktrees only isolate the filesystem

Ports, databases, and the Docker daemon remain shared across all worktrees. If two agents both try to bind port 3000 or migrate the same database, they collide. Use per-worktree port offsets (e.g., PORT=3000 + worktree_index) and separate test databases.

Task independence test

Before parallelizing, verify all three conditions:

  1. File exclusivity — no two agents edit the same file
  2. Interface stability — no agent changes an API another agent depends on
  3. Bounded scope — each task has a clear definition of done

If any condition fails, serialize the tasks instead.

5. Practical Patterns

Where parallel agents shine

  • Research and proof-of-concept spikes
  • Understanding existing systems (reading, not writing)
  • Low-stakes maintenance (dependency bumps, lint fixes)
  • Specified, directed work with clear acceptance criteria

Detailed specs reduce review effort dramatically. An agent working from a two-paragraph description of what to build and how to test it produces code that takes minutes to review. An agent working from "make the auth better" produces code that takes longer to review than to rewrite.

Documentation flywheel

Use agents to document how your codebase works — architecture decisions, data flow, module boundaries. Then use those docs as context for future prompts. Each pass improves the documentation, which improves the next agent's output, which produces better documentation.

Merge strategies

StrategySpeedSafetyWhen to use
Lead-agent mergeFastLowerInternal tools, prototypes
Human review gateSlowerHigherProduction code, shared APIs
The real bottleneck: Verification is the bottleneck, not code generation. An agent can produce a 500-line feature in minutes, but proving it works correctly under edge cases still requires human judgment and well-designed test suites.
Main Agent → /clear → Fresh context → Subagent (isolated) → Summary back 200K window Reset New task Separate window Only results