ds4-agent setup

A cautious native-agent setup path for local DeepSeek V4 Flash experiments, kept separate from your default coding-agent profiles.

Last updated

Local Inference Alpha Agent

1. Where it fits

ds4-agent is the native agent shipped by antirez/ds4. It is different from ds4-server: the agent, tool loop, model runtime, and session state live in one local process instead of a Claude/Codex/Pi client talking to an OpenAI- or Anthropic-compatible HTTP endpoint.

Start with it as a benchmark target. Keep your existing claude-ds4, codex-ds4, pi-ds4, and Spark profiles intact until native ds4-agent wins real repo-editing tests.

Use it forDo not use it for yet
Disposable local coding-agent experimentsYour only production coding-agent path
DeepSeek V4 Flash native-agent testingUnreviewed edits in important repos
Pass-per-minute benchmark runsHeadless CI automation until pipe mode exists
Private local loops with trace files kept localPublic trace sharing

2. Build ds4-agent

Build from source in a normal user directory. This keeps the install easy to delete and avoids changing global agent behavior.

mkdir -p ~/src-repo
git clone https://github.com/antirez/ds4.git ~/src-repo/ds4
cd ~/src-repo/ds4
make ds4-agent ds4-server
./ds4-agent --help

If you already cloned ds4, update intentionally and rebuild:

cd ~/src-repo/ds4
git pull --ff-only
make clean
make ds4-agent ds4-server

3. Model path

Use the same DeepSeek V4 Flash q2-imatrix model you use for ds4-server. On 96GB and 128GB Apple Silicon machines, q2-imatrix is the practical starting point. Do not start with q4 on a 128GB Mac unless you are deliberately testing memory failure behavior.

cd ~/src-repo/ds4
./download_model.sh q2-imatrix

# Pick the downloaded GGUF without hard-coding private paths.
export DS4_MODEL="$(find "$PWD" -name '*.gguf' | head -1)"
test -n "$DS4_MODEL" && ls -lh "$DS4_MODEL"
Be explicit when benchmarking: once you know the exact GGUF path, pin DS4_MODEL in your shell or wrapper. Do not let a benchmark accidentally switch models because find returned a different file.

4. Smoke tests

First verify the binary without loading a model, then run a tiny prompt. Keep context moderate for smoke tests; long context belongs in the benchmark step.

cd ~/src-repo/ds4
./ds4-agent --help

./ds4-agent \
  -m "$DS4_MODEL" \
  --ctx 32768 \
  --nothink \
  -p "reply with exactly OK"

A passing smoke test only proves that the binary and model load. It does not prove the agent can edit a repo reliably.

5. Local wrapper

Create a named command instead of replacing claude, codex, or pi. This keeps native ds4-agent experiments opt-in.

mkdir -p ~/.local/bin
cat > ~/.local/bin/ds4-agent-local <<'EOF'
#!/usr/bin/env bash
set -euo pipefail

repo="${DS4_REPO:-$HOME/src-repo/ds4}"
cd "$repo"

model="${DS4_MODEL:-}"
if [ -z "$model" ]; then
  model="$(find "$repo" -name '*.gguf' | head -1)"
fi

if [ -z "$model" ] || [ ! -f "$model" ]; then
  echo "DS4_MODEL is not set and no GGUF model was found under $repo" >&2
  exit 1
fi

exec "$repo/ds4-agent" \
  -m "$model" \
  --ctx "${DS4_CTX:-32768}" \
  --nothink \
  "$@"
EOF
chmod +x ~/.local/bin/ds4-agent-local

Make sure ~/.local/bin is on your PATH, then test:

ds4-agent-local -p "reply with exactly OK"

6. Disposable repo test

Run first edits in a throwaway repo. Do not point a new native-agent profile at an important working tree until it can pass this level of test without stalls or stale edits.

rm -rf /tmp/ds4-agent-fixture
mkdir -p /tmp/ds4-agent-fixture
cd /tmp/ds4-agent-fixture
git init
cat > README.md <<'EOF'
# Fixture

Write a short setup note in docs/setup.md.
EOF

ds4-agent-local \
  --trace /tmp/ds4-agent-fixture.trace \
  -p "Create docs/setup.md with three concise setup steps, then stop."

Inspect the result manually:

git status --short
find . -maxdepth 3 -type f -not -path './.git/*' -print
sed -n '1,120p' docs/setup.md

7. Benchmark path

Benchmark native ds4-agent against ds4-server, Codex frontier, and Spark profiles with the same fixture tasks. Score pass/fail first, then wall time. A useful result row includes model, quantization, context, thinking mode, prompt kind, pass/fail, median wall time, total wall time, and whether the final repo state is correct.

Current limitation: native ds4-agent is TTY-first, so a clean headless benchmark may need an expect/script wrapper until upstream has a pipe mode. See ds4-agent vs Codex frontier and the benchmark registry before publishing claims.

8. Privacy and gotchas

  • Trace files can contain prompts, file contents, tool outputs, and generated text. Keep them out of public repos.
  • Use placeholders in public docs. Do not publish local usernames, hostnames, LAN IPs, known-host fingerprints, or absolute private paths.
  • Keep native ds4-agent as a named side command until it passes real tool-use tests.
  • For long-context work, increase context gradually. If you see KV-cache or decode failures, lower client context before changing models.
  • Do not compare raw token speed to agent productivity. Promote only after correct edits, valid tool behavior, and no stalls.