# Local AI Benchmark Scripts

This bundle reproduces the dated local AI benchmark rows published on learntoprompt.org.

The public bundle intentionally omits raw run logs, SSH known-hosts files, LAN addresses, MAC addresses, hostnames, usernames, and private paths. The benchmark script redacts non-local endpoint hosts in `runs.jsonl` and `manifest.json` by default. Pass `--show-endpoints` only for private notes.

## Files

- `bench.py`: benchmark runner for ds4 OpenAI-compatible chat, Spark Ollama, and llama.cpp OpenAI-compatible chat.
- `run-ds4.sh`: wrapper for the ds4 profile.
- `run-spark.sh`: wrapper for the Spark Ollama profile.
- `run-llama.sh`: wrapper for the Spark llama.cpp profile.
- `run-both.sh`: wrapper for ds4 and Spark Ollama.
- `run-practical-all.sh`: wrapper for all built-in suites across ds4, Spark Ollama, and Spark llama.cpp.
- `pull-ollama.py`: pull an Ollama model through the configured tunnel.
- `spark-tunnel.example.sh`: placeholder SSH tunnel for a remote Ollama server.
- `spark-llama-tunnel.example.sh`: placeholder SSH tunnel for a remote llama.cpp server.
- `spark-llama-cpp-server.sh`: host-side helper for building and launching `llama-server`.
- `redaction-check.sh`: scan a folder for common private identifiers before publishing.

## Setup

Download the files into a `bench-local-ai` folder and make the scripts executable.

```sh
mkdir -p bench-local-ai
cd bench-local-ai
base='https://learntoprompt.org/downloads/bench-local-ai'
for file in \
  bench.py pull-ollama.py \
  run-ds4.sh run-spark.sh run-llama.sh run-both.sh run-practical-all.sh \
  spark-tunnel.example.sh spark-llama-tunnel.example.sh \
  spark-llama-cpp-server.sh redaction-check.sh; do
  curl -fsSLO "$base/$file"
done
chmod +x *.py *.sh
```

## Endpoint Defaults

The runner assumes local or tunneled endpoints:

```sh
DS4_URL='http://127.0.0.1:8000/v1/chat/completions'
SPARK_OLLAMA_URL='http://127.0.0.1:11435'
LLAMA_URL='http://127.0.0.1:18080/v1/chat/completions'
```

If the model server runs on another machine, prefer an SSH tunnel and keep the benchmark endpoint on localhost.

## Spark Ollama Tunnel

Set your own SSH details in the environment. Do not commit the edited values.

```sh
export SPARK_SSH_USER='your-user'
export SPARK_SSH_HOST='your-spark-host.example'
./spark-tunnel.example.sh
```

Then run:

```sh
SPARK_MODEL='qwen3-coder:30b' ./run-spark.sh 3 --exclude-kind long-context --timeout 90
```

## Spark llama.cpp Tunnel

Start `spark-llama-cpp-server.sh` on the remote host, then open the tunnel locally.

```sh
export SPARK_SSH_USER='your-user'
export SPARK_SSH_HOST='your-spark-host.example'
./spark-llama-tunnel.example.sh
```

Then run:

```sh
LLAMA_MODEL='qwen3-coder-30b.gguf' ./run-llama.sh 3 --exclude-kind long-context --timeout 180
```

Spark llama.cpp is the current Spark performance default for the published benchmark rows. Spark Ollama is still useful for model management and quick pulls, but the measured llama.cpp path is materially faster for the same `qwen3-coder:30b` model bits.

## ds4

Start ds4-server locally, then run:

```sh
DS4_MODEL='deepseek-v4-flash' ./run-ds4.sh 3 --timeout 240
```

## Built-in Suites

The harness has four suites:

- `smoke`: endpoint health, short latency, JSON obedience, and long-prefill stress.
- `code`: surgical edit, seeded bug review, and repo-location reasoning.
- `question`: local benchmark questions, abstention, and citation checks.
- `wiki`: source-grounded wiki query, frontmatter generation, and unsupported-claim audit.

Run one suite:

```sh
./run-llama.sh 1 --suite code --timeout 180
```

Run every built-in suite against all configured targets:

```sh
LLAMA_MODEL='qwen3-coder-30b.gguf' ./run-practical-all.sh 1 --timeout 240
```

## Local Profile Names

The public benchmark page tracks these profile targets by role, not by private host details:

| Tool | Shortcut / profile | Backend |
| --- | --- | --- |
| Claude Code | `claude-spark-llama` / `csllama` | local Anthropic gateway to Spark llama.cpp |
| Codex | `codex-spark-llama` / `xsllama` | local Responses shim to Spark llama.cpp |
| Pi | `pi-spark-llama` | OpenAI-compatible model entry |
| OpenCode | provider `spark-llama.cpp`, model `qwen3-coder-30b.gguf` | OpenAI-compatible endpoint |

For public reproduction, keep the benchmark endpoint on `127.0.0.1` through an SSH tunnel. Do not publish private direct-LAN endpoints.

## Result Files

Each run writes a timestamped folder under `results/` with:

- `runs.jsonl`: per-task raw records with redacted non-local endpoints by default.
- `summary.md`: grouped median wall time, pass counts, output size, and token-rate metrics when available.
- `manifest.json`: command, suite, task, model, and redacted endpoint metadata.

Before publishing generated files:

```sh
./redaction-check.sh results
```

Do not publish SSH configs, known-hosts files, shell history, raw server logs, or full result files created with `--show-endpoints`.