ds4-agent setup
A cautious native-agent setup path for local DeepSeek V4 Flash experiments, kept separate from your default coding-agent profiles.
Last updated
Local Inference Alpha Agent1. Where it fits
ds4-agent is the native agent shipped by antirez/ds4. It is different from ds4-server: the agent, tool loop, model runtime, and session state live in one local process instead of a Claude/Codex/Pi client talking to an OpenAI- or Anthropic-compatible HTTP endpoint.
Start with it as a benchmark target. Keep your existing claude-ds4, codex-ds4, pi-ds4, and Spark profiles intact until native ds4-agent wins real repo-editing tests.
| Use it for | Do not use it for yet |
|---|---|
| Disposable local coding-agent experiments | Your only production coding-agent path |
| DeepSeek V4 Flash native-agent testing | Unreviewed edits in important repos |
| Pass-per-minute benchmark runs | Headless CI automation until pipe mode exists |
| Private local loops with trace files kept local | Public trace sharing |
2. Build ds4-agent
Build from source in a normal user directory. This keeps the install easy to delete and avoids changing global agent behavior.
mkdir -p ~/src-repo
git clone https://github.com/antirez/ds4.git ~/src-repo/ds4
cd ~/src-repo/ds4
make ds4-agent ds4-server
./ds4-agent --help
If you already cloned ds4, update intentionally and rebuild:
cd ~/src-repo/ds4
git pull --ff-only
make clean
make ds4-agent ds4-server
3. Model path
Use the same DeepSeek V4 Flash q2-imatrix model you use for ds4-server. On 96GB and 128GB Apple Silicon machines, q2-imatrix is the practical starting point. Do not start with q4 on a 128GB Mac unless you are deliberately testing memory failure behavior.
cd ~/src-repo/ds4
./download_model.sh q2-imatrix
# Pick the downloaded GGUF without hard-coding private paths.
export DS4_MODEL="$(find "$PWD" -name '*.gguf' | head -1)"
test -n "$DS4_MODEL" && ls -lh "$DS4_MODEL"
DS4_MODEL in your shell or wrapper. Do not let a benchmark accidentally switch models because find returned a different file.
4. Smoke tests
First verify the binary without loading a model, then run a tiny prompt. Keep context moderate for smoke tests; long context belongs in the benchmark step.
cd ~/src-repo/ds4
./ds4-agent --help
./ds4-agent \
-m "$DS4_MODEL" \
--ctx 32768 \
--nothink \
-p "reply with exactly OK"
A passing smoke test only proves that the binary and model load. It does not prove the agent can edit a repo reliably.
5. Local wrapper
Create a named command instead of replacing claude, codex, or pi. This keeps native ds4-agent experiments opt-in.
mkdir -p ~/.local/bin
cat > ~/.local/bin/ds4-agent-local <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
repo="${DS4_REPO:-$HOME/src-repo/ds4}"
cd "$repo"
model="${DS4_MODEL:-}"
if [ -z "$model" ]; then
model="$(find "$repo" -name '*.gguf' | head -1)"
fi
if [ -z "$model" ] || [ ! -f "$model" ]; then
echo "DS4_MODEL is not set and no GGUF model was found under $repo" >&2
exit 1
fi
exec "$repo/ds4-agent" \
-m "$model" \
--ctx "${DS4_CTX:-32768}" \
--nothink \
"$@"
EOF
chmod +x ~/.local/bin/ds4-agent-local
Make sure ~/.local/bin is on your PATH, then test:
ds4-agent-local -p "reply with exactly OK"
6. Disposable repo test
Run first edits in a throwaway repo. Do not point a new native-agent profile at an important working tree until it can pass this level of test without stalls or stale edits.
rm -rf /tmp/ds4-agent-fixture
mkdir -p /tmp/ds4-agent-fixture
cd /tmp/ds4-agent-fixture
git init
cat > README.md <<'EOF'
# Fixture
Write a short setup note in docs/setup.md.
EOF
ds4-agent-local \
--trace /tmp/ds4-agent-fixture.trace \
-p "Create docs/setup.md with three concise setup steps, then stop."
Inspect the result manually:
git status --short
find . -maxdepth 3 -type f -not -path './.git/*' -print
sed -n '1,120p' docs/setup.md
7. Benchmark path
Benchmark native ds4-agent against ds4-server, Codex frontier, and Spark profiles with the same fixture tasks. Score pass/fail first, then wall time. A useful result row includes model, quantization, context, thinking mode, prompt kind, pass/fail, median wall time, total wall time, and whether the final repo state is correct.
Current limitation: native ds4-agent is TTY-first, so a clean headless benchmark may need an expect/script wrapper until upstream has a pipe mode. See ds4-agent vs Codex frontier and the benchmark registry before publishing claims.
8. Privacy and gotchas
- Trace files can contain prompts, file contents, tool outputs, and generated text. Keep them out of public repos.
- Use placeholders in public docs. Do not publish local usernames, hostnames, LAN IPs, known-host fingerprints, or absolute private paths.
- Keep native
ds4-agentas a named side command until it passes real tool-use tests. - For long-context work, increase context gradually. If you see KV-cache or decode failures, lower client context before changing models.
- Do not compare raw token speed to agent productivity. Promote only after correct edits, valid tool behavior, and no stalls.