practical guides for using ai coding agents securely and effectively. setup with bondage, optional envchain-xtra, nono sandboxing, and workflow patterns that hold up in real use.
be specific. define format, length, tone, audience, constraints.
give context. background, examples, and goals lift output quality.
iterate. refine, adjust, build on previous responses.
think in systems. chain prompts. break complex tasks into steps.
architecture
the hard part is not only installing tools. it is deciding where trust should live, how screenshots should work, and what a real escape hatch looks like.
- agent-stack— bondage 0.2.7, envchain-xtra 1.3.1, nono 0.61.1, version guards, and non-doxing checks.arch
- analytics-privacy— block non-essential analytics, telemetry, experiments, and error reporting.arch
- vendor-independence— keep instructions, plugins, and security layers portable across clients.arch
- claude-to-codex-plugins— turn a Claude-first plugin into a Codex first-class target without forking the workflow.arch
- visual-inspection— accessibility tree first, localhost screenshot service when pixels matter.arch
- sandbox-profiles— shape tiers and keep active profile policy read-only from normal agents.arch
agents
run ai coding agents with the preferred stack: bondage for launch policy, optional envchain-xtra for secret release, nono for kernel sandboxing.
- claude-code— anthropic's cli launched through bondage with kernel sandbox.cloud
- opencode— multi-provider agent with bondage + envchain secret injection.cloud
- codex— OpenAI CLI with bondage, nono 0.61 profile packs, and draft profile repair.cloud
- pi— minimal ts agent with pinned node, package-tree verification, and local ds4 side profile.cloud
local inference
set up local model runtimes and machines that can back coding agents without turning benchmark data into setup instructions.
- llama.cpp— local llm inference with metal gpu and model-path isolation.local
- ollama— local model manager with daemon sandboxing and optional remote keys.local
- qwen3-tts mlx— local Apple Silicon voice generation with MLX-Audio, reference clips, Markdown chunking, inspection, and repair.local
- ds4 mbp m5-128— run deepseek v4 flash locally with ds4-agent experiments, claude, codex, and pi profiles.local
- ds4-agent setup— build native ds4-agent, pin a DeepSeek V4 Flash model path, run disposable repo tests, and keep experiments separate from default agents.local
- ds4 dgx spark— run DeepSeek V4 Flash on NVIDIA Spark with ds4 CUDA, q2-imatrix, MTP, localhost serving, agent profile tests, and benchmark caveats.local
- dgx-spark— set up a local nvidia box for ollama, qwen3-coder, and side profiles.local
evaluation
compare local model progress and cloud side profiles with dated runs, repeatable parameters, and hardware sizing tools.
- benchmarks— dated local ai runs plus Spark ds4, ds4-agent, Codex frontier, and Gemini comparison lanes with hardware, runtime, context, pass rate, and scripts.eval
- ds4-agent vs codex— public benchmark reality check for native ds4-agent, DeepSeek V4 Flash, and Codex frontier.eval
- llm hardware calculator— model memory, kv cache, hardware fit, and single-user decode estimates.calc
workflow
get more from coding agents with better prompts, context management, and automation.
- claude-md— instruction budget, progressive disclosure, agents.md cross-tool standard.flow
- mcp-servers— connect agents to external tools via model context protocol.flow
- prompting— explore-plan-implement-commit, stepwise prompting, verification.flow
- hooks— 26 lifecycle events, 5 handler types, skills, ci/cd.flow
- context— compaction, session management, subagents, worktree isolation.flow
- agentnoise— White Noise phone control for local agents, media/wiki ingest, fake-phone tests, and stable vs Dark Matter alpha installs.remote
- llm-wiki— append-only knowledge bases compiled by llm agents (claude code / codex / opencode).flow
featured
llm hardware calculator
pick any text-generation model on huggingface — see what hardware can run it, how much memory it needs, and a single-user decode tok/s estimate. moe-aware: active params drive speed, full params drive memory. covers apple silicon, dgx spark (1×–8×), rtx, amd strix halo. open the calculator →
- fetches model specs directly from huggingface (no backend)
- memory breakdown: weights + kv cache + overhead
- multi-spark factors capture the 3× pp=3 slowdown
- shareable urls (e.g.
/calculator/?model=Qwen/Qwen3-30B-A3B&ctx=131072)
local ai benchmark registry
follow measured local model progress across m5 max, dgx spark, ds4, ds4-agent, ollama, mlx, llama.cpp, sglang, and vllm, with Codex frontier and Gemini tracked separately as cloud comparison lanes. the current spark default and the Spark ds4 side profile are tracked with smoke, code, question, and wiki suites plus public reproduction scripts. open benchmarks →
- daily-agent and practical-suite pass rates, not just token speed
- model/runtime settings preserved with every row
- cloud profiles labeled separately from local hardware runs
- clear defaults and rejected profiles
- downloadable scripts with privacy redaction defaults
llm-wiki
turn any ai agent into a research engine. builds append-only markdown knowledge bases through parallel multi-agent investigation, source ingestion, and cross-referenced article compilation — zero runtime dependencies. read the guide →
- parallel multi-agent research from academic, technical, and contrarian angles
- thesis-driven investigation with for/against evidence scoring
- reports, study guides, slide decks, implementation plans
- claude code, openai codex, any llm agent