ds4-agent setup

A cautious native-agent setup path for local DeepSeek V4 Flash experiments, kept separate from your default coding-agent profiles.

Last updated May 21, 2026

Local Inference Alpha Agent

1. Where it fits

ds4-agent is the native agent shipped by antirez/ds4. It is different from ds4-server: the agent, tool loop, model runtime, and session state live in one local process instead of a Claude/Codex/Pi client talking to an OpenAI- or Anthropic-compatible HTTP endpoint.

Start with it as a benchmark target. Keep your existing claude-ds4, codex-ds4, pi-ds4, and Spark profiles intact until native ds4-agent wins real repo-editing tests.

Use it for	Do not use it for yet
Disposable local coding-agent experiments	Your only production coding-agent path
DeepSeek V4 Flash native-agent testing	Unreviewed edits in important repos
Pass-per-minute benchmark runs	Headless CI automation until pipe mode exists
Private local loops with trace files kept local	Public trace sharing

2. Build ds4-agent

Build from source in a normal user directory. This keeps the install easy to delete and avoids changing global agent behavior.

mkdir -p ~/src-repo
git clone https://github.com/antirez/ds4.git ~/src-repo/ds4
cd ~/src-repo/ds4
make ds4-agent ds4-server
./ds4-agent --help

If you already cloned ds4, update intentionally and rebuild:

cd ~/src-repo/ds4
git pull --ff-only
make clean
make ds4-agent ds4-server

3. Model path

Use the same DeepSeek V4 Flash q2-imatrix model you use for ds4-server. On 96GB and 128GB Apple Silicon machines, q2-imatrix is the practical starting point. Do not start with q4 on a 128GB Mac unless you are deliberately testing memory failure behavior.

cd ~/src-repo/ds4
./download_model.sh q2-imatrix

# Pick the downloaded GGUF without hard-coding private paths.
export DS4_MODEL="$(find "$PWD" -name '*.gguf' | head -1)"
test -n "$DS4_MODEL" && ls -lh "$DS4_MODEL"

Be explicit when benchmarking: once you know the exact GGUF path, pin DS4_MODEL in your shell or wrapper. Do not let a benchmark accidentally switch models because find returned a different file.

4. Smoke tests

First verify the binary without loading a model, then run a tiny prompt. Keep context moderate for smoke tests; long context belongs in the benchmark step.

cd ~/src-repo/ds4
./ds4-agent --help

./ds4-agent \
  -m "$DS4_MODEL" \
  --ctx 32768 \
  --nothink \
  -p "reply with exactly OK"

A passing smoke test only proves that the binary and model load. It does not prove the agent can edit a repo reliably.

5. Local wrapper

Create a named command instead of replacing claude, codex, or pi. This keeps native ds4-agent experiments opt-in.

mkdir -p ~/.local/bin
cat > ~/.local/bin/ds4-agent-local <<'EOF'
#!/usr/bin/env bash
set -euo pipefail

repo="${DS4_REPO:-$HOME/src-repo/ds4}"
cd "$repo"

model="${DS4_MODEL:-}"
if [ -z "$model" ]; then
  model="$(find "$repo" -name '*.gguf' | head -1)"
fi

if [ -z "$model" ] || [ ! -f "$model" ]; then
  echo "DS4_MODEL is not set and no GGUF model was found under $repo" >&2
  exit 1
fi

exec "$repo/ds4-agent" \
  -m "$model" \
  --ctx "${DS4_CTX:-32768}" \
  --nothink \
  "$@"
EOF
chmod +x ~/.local/bin/ds4-agent-local

Make sure ~/.local/bin is on your PATH, then test:

ds4-agent-local -p "reply with exactly OK"

6. Disposable repo test

Run first edits in a throwaway repo. Do not point a new native-agent profile at an important working tree until it can pass this level of test without stalls or stale edits.

rm -rf /tmp/ds4-agent-fixture
mkdir -p /tmp/ds4-agent-fixture
cd /tmp/ds4-agent-fixture
git init
cat > README.md <<'EOF'
# Fixture

Write a short setup note in docs/setup.md.
EOF

ds4-agent-local \
  --trace /tmp/ds4-agent-fixture.trace \
  -p "Create docs/setup.md with three concise setup steps, then stop."

Inspect the result manually:

git status --short
find . -maxdepth 3 -type f -not -path './.git/*' -print
sed -n '1,120p' docs/setup.md

7. Benchmark path

Benchmark native ds4-agent against ds4-server, Codex frontier, and Spark profiles with the same fixture tasks. Score pass/fail first, then wall time. A useful result row includes model, quantization, context, thinking mode, prompt kind, pass/fail, median wall time, total wall time, and whether the final repo state is correct.

Current limitation: native ds4-agent is TTY-first, so a clean headless benchmark may need an expect/script wrapper until upstream has a pipe mode. See ds4-agent vs Codex frontier and the benchmark registry before publishing claims.

8. Privacy and gotchas

Trace files can contain prompts, file contents, tool outputs, and generated text. Keep them out of public repos.
Use placeholders in public docs. Do not publish local usernames, hostnames, LAN IPs, known-host fingerprints, or absolute private paths.
Keep native ds4-agent as a named side command until it passes real tool-use tests.
For long-context work, increase context gradually. If you see KV-cache or decode failures, lower client context before changing models.
Do not compare raw token speed to agent productivity. Promote only after correct edits, valid tool behavior, and no stalls.