DGX Spark

A local NVIDIA box for coding models: SSH, updates, Docker GPU validation, Ollama, Qwen3-Coder, and side profiles.

Last updated

Local Inference CUDA

1. SSH bootstrap

Use placeholders in docs and scripts. Do not publish your real LAN address, host name, user-created account name, or host-key fingerprint.

export SPARK_HOST=<spark-ip-or-hostname>
export SPARK_USER=<spark-user>

First password login, bypassing your normal SSH config:

ssh -F none \
  -o BatchMode=no \
  -o PubkeyAuthentication=no \
  -o PasswordAuthentication=yes \
  -o KbdInteractiveAuthentication=yes \
  -o PreferredAuthentications=password,keyboard-interactive \
  -o NumberOfPasswordPrompts=3 \
  "$SPARK_USER@$SPARK_HOST"

Install an existing Mac SSH key. Do not generate a new key just because this is a new device:

ssh-copy-id -F none -f \
  -o BatchMode=no \
  -o PubkeyAuthentication=no \
  -o PasswordAuthentication=yes \
  -o KbdInteractiveAuthentication=yes \
  -o PreferredAuthentications=password,keyboard-interactive \
  "$SPARK_USER@$SPARK_HOST"

Test key-only login:

ssh -F none \
  -o PubkeyAuthentication=yes \
  -o PasswordAuthentication=no \
  "$SPARK_USER@$SPARK_HOST"

2. Dashboard and updates

DGX Dashboard is local to the Spark. Tunnel it:

ssh -F none -L 11000:localhost:11000 "$SPARK_USER@$SPARK_HOST"

Open:

http://localhost:11000

Use Dashboard updates first. If package state is partially updated, the repair path is:

sudo apt --fix-broken install
sudo apt update
sudo apt dist-upgrade
sudo fwupdmgr refresh
sudo fwupdmgr upgrade
sudo reboot

After reboot, validate:

cat /etc/dgx-release
uname -r
nvidia-smi
ls /dev/nvidia*
docker --version
nvidia-ctk --version
Order matters: if nvidia-smi fails on a fresh Spark, update and reboot before debugging model serving.

3. GPU and Docker validation

Once the NVIDIA driver works, test Docker GPU access. If you have NGC access, NVIDIA optimized containers are useful later. The Ollama path below does not require an NGC key.

Optional non-sudo Docker access:

sudo usermod -aG docker "$USER"

Reconnect with a fresh SSH session, then:

docker ps
Fresh shell required: if Docker still says permission denied after usermod, the current SSH shell predates the group change.

4. Ollama without NGC

Create a simple layout:

mkdir -p ~/work ~/models ~/data ~/containers ~/logs
mkdir -p ~/models/ollama ~/models/huggingface

Create an environment file:

cat > ~/.local-ai-env <<'EOF'
export PATH=/usr/local/cuda/bin:$HOME/bin:$PATH
export HF_HOME=$HOME/models/huggingface
export HUGGINGFACE_HUB_CACHE=$HOME/models/huggingface/hub
export OLLAMA_MODELS=$HOME/models/ollama
EOF

Create the Ollama Docker launcher:

mkdir -p ~/containers/ollama
cat > ~/containers/ollama/run.sh <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
mkdir -p "$HOME/models/ollama"
docker rm -f ollama >/dev/null 2>&1 || true
docker run -d \
  --gpus=all \
  --name ollama \
  --restart unless-stopped \
  -p 127.0.0.1:11434:11434 \
  -v "$HOME/models/ollama:/root/.ollama" \
  ollama/ollama:latest
EOF
chmod +x ~/containers/ollama/run.sh

Start it:

~/containers/ollama/run.sh
docker ps

Smoke test:

docker exec -it ollama ollama run llama3.2 "reply with ok"

5. Qwen3-Coder keepalive

Pull and run a coding model:

docker exec -it ollama ollama run qwen3-coder:30b

Keep it loaded:

docker exec ollama ollama run qwen3-coder:30b \
  --keepalive -1 \
  "reply with ok"

Check residency:

docker exec ollama ollama ps

Preload after boot with a small script:

cat > ~/containers/ollama/preload.sh <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
for i in $(seq 1 60); do
  if docker exec ollama ollama ps >/dev/null 2>&1; then
    break
  fi
  sleep 2
done
docker exec ollama ollama run qwen3-coder:30b --keepalive -1 "reply with ok" >/dev/null
EOF
chmod +x ~/containers/ollama/preload.sh

Install a service. Replace the user and group fields with values from your Spark:

sudo tee /etc/systemd/system/ollama-qwen-preload.service >/dev/null <<'EOF'
[Unit]
Description=Preload qwen3-coder in Ollama after boot
Requires=docker.service
After=docker.service network-online.target
Wants=network-online.target

[Service]
Type=oneshot
User=<spark-user>
Group=docker
ExecStart=/home/<spark-user>/containers/ollama/preload.sh
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now ollama-qwen-preload.service
systemctl status ollama-qwen-preload.service --no-pager
docker exec ollama ollama ps

6. Mac tunnels

Keep Ollama private on the Spark and tunnel it back to the Mac:

ssh -F none \
  -L 11000:localhost:11000 \
  -L 11434:localhost:11434 \
  "$SPARK_USER@$SPARK_HOST"

From the Mac:

curl http://localhost:11434/api/chat \
  -d '{
    "model": "qwen3-coder:30b",
    "messages": [{"role": "user", "content": "reply with ok"}]
  }'

7. Claude Code side profile

Expose Spark-backed Claude as a named side command, not a replacement for default Claude Code:

claude-spark
cspark
spark-ssh
spark-dashboard
spark-ollama
spark-tunnels

The working shape is:

Claude Code -> local Anthropic gateway -> SSH tunnel -> Spark Ollama -> qwen3-coder:30b

Smoke test:

claude-spark -p "Reply with exactly: spark-ok"
Tool use is the real test. A model can answer chat through the gateway and still struggle with strict Bash/Read/Grep tool schemas. Validate actual coding tasks before trusting it as a replacement.

8. Privacy and gotchas

  • Do not publish your real Spark IP, hostname, username, SSH fingerprint, serial number, or local absolute paths.
  • Bind Ollama to 127.0.0.1 and use SSH tunnels first.
  • Use ssh -F none during bootstrap if your local SSH config suppresses password prompts.
  • Update the Spark before debugging model serving if the driver is broken.
  • Docker group membership requires a fresh login.
  • NGC is optional for the Ollama path, but useful later for NVIDIA-optimized serving containers.
Mac shell -> SSH tunnel -> Spark localhost -> Ollama Docker -> Qwen3-Coder