DGX Spark
A local NVIDIA box for coding models: SSH, updates, Docker GPU validation, Ollama, Qwen3-Coder, and side profiles.
Last updated
Local Inference CUDA1. SSH bootstrap
Use placeholders in docs and scripts. Do not publish your real LAN address, host name, user-created account name, or host-key fingerprint.
export SPARK_HOST=<spark-ip-or-hostname>
export SPARK_USER=<spark-user>
First password login, bypassing your normal SSH config:
ssh -F none \
-o BatchMode=no \
-o PubkeyAuthentication=no \
-o PasswordAuthentication=yes \
-o KbdInteractiveAuthentication=yes \
-o PreferredAuthentications=password,keyboard-interactive \
-o NumberOfPasswordPrompts=3 \
"$SPARK_USER@$SPARK_HOST"
Install an existing Mac SSH key. Do not generate a new key just because this is a new device:
ssh-copy-id -F none -f \
-o BatchMode=no \
-o PubkeyAuthentication=no \
-o PasswordAuthentication=yes \
-o KbdInteractiveAuthentication=yes \
-o PreferredAuthentications=password,keyboard-interactive \
"$SPARK_USER@$SPARK_HOST"
Test key-only login:
ssh -F none \
-o PubkeyAuthentication=yes \
-o PasswordAuthentication=no \
"$SPARK_USER@$SPARK_HOST"
2. Dashboard and updates
DGX Dashboard is local to the Spark. Tunnel it:
ssh -F none -L 11000:localhost:11000 "$SPARK_USER@$SPARK_HOST"
Open:
http://localhost:11000
Use Dashboard updates first. If package state is partially updated, the repair path is:
sudo apt --fix-broken install
sudo apt update
sudo apt dist-upgrade
sudo fwupdmgr refresh
sudo fwupdmgr upgrade
sudo reboot
After reboot, validate:
cat /etc/dgx-release
uname -r
nvidia-smi
ls /dev/nvidia*
docker --version
nvidia-ctk --version
nvidia-smi fails on a fresh Spark, update and reboot before debugging model serving.
3. GPU and Docker validation
Once the NVIDIA driver works, test Docker GPU access. If you have NGC access, NVIDIA optimized containers are useful later. The Ollama path below does not require an NGC key.
Optional non-sudo Docker access:
sudo usermod -aG docker "$USER"
Reconnect with a fresh SSH session, then:
docker ps
usermod, the current SSH shell predates the group change.
4. Ollama without NGC
Create a simple layout:
mkdir -p ~/work ~/models ~/data ~/containers ~/logs
mkdir -p ~/models/ollama ~/models/huggingface
Create an environment file:
cat > ~/.local-ai-env <<'EOF'
export PATH=/usr/local/cuda/bin:$HOME/bin:$PATH
export HF_HOME=$HOME/models/huggingface
export HUGGINGFACE_HUB_CACHE=$HOME/models/huggingface/hub
export OLLAMA_MODELS=$HOME/models/ollama
EOF
Create the Ollama Docker launcher:
mkdir -p ~/containers/ollama
cat > ~/containers/ollama/run.sh <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
mkdir -p "$HOME/models/ollama"
docker rm -f ollama >/dev/null 2>&1 || true
docker run -d \
--gpus=all \
--name ollama \
--restart unless-stopped \
-p 127.0.0.1:11434:11434 \
-v "$HOME/models/ollama:/root/.ollama" \
ollama/ollama:latest
EOF
chmod +x ~/containers/ollama/run.sh
Start it:
~/containers/ollama/run.sh
docker ps
Smoke test:
docker exec -it ollama ollama run llama3.2 "reply with ok"
5. Qwen3-Coder keepalive
Pull and run a coding model:
docker exec -it ollama ollama run qwen3-coder:30b
Keep it loaded:
docker exec ollama ollama run qwen3-coder:30b \
--keepalive -1 \
"reply with ok"
Check residency:
docker exec ollama ollama ps
Preload after boot with a small script:
cat > ~/containers/ollama/preload.sh <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
for i in $(seq 1 60); do
if docker exec ollama ollama ps >/dev/null 2>&1; then
break
fi
sleep 2
done
docker exec ollama ollama run qwen3-coder:30b --keepalive -1 "reply with ok" >/dev/null
EOF
chmod +x ~/containers/ollama/preload.sh
Install a service. Replace the user and group fields with values from your Spark:
sudo tee /etc/systemd/system/ollama-qwen-preload.service >/dev/null <<'EOF'
[Unit]
Description=Preload qwen3-coder in Ollama after boot
Requires=docker.service
After=docker.service network-online.target
Wants=network-online.target
[Service]
Type=oneshot
User=<spark-user>
Group=docker
ExecStart=/home/<spark-user>/containers/ollama/preload.sh
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now ollama-qwen-preload.service
systemctl status ollama-qwen-preload.service --no-pager
docker exec ollama ollama ps
6. Mac tunnels
Keep Ollama private on the Spark and tunnel it back to the Mac:
ssh -F none \
-L 11000:localhost:11000 \
-L 11434:localhost:11434 \
"$SPARK_USER@$SPARK_HOST"
From the Mac:
curl http://localhost:11434/api/chat \
-d '{
"model": "qwen3-coder:30b",
"messages": [{"role": "user", "content": "reply with ok"}]
}'
7. Claude Code side profile
Expose Spark-backed Claude as a named side command, not a replacement for default Claude Code:
claude-spark
cspark
spark-ssh
spark-dashboard
spark-ollama
spark-tunnels
The working shape is:
Smoke test:
claude-spark -p "Reply with exactly: spark-ok"
8. Privacy and gotchas
- Do not publish your real Spark IP, hostname, username, SSH fingerprint, serial number, or local absolute paths.
- Bind Ollama to
127.0.0.1and use SSH tunnels first. - Use
ssh -F noneduring bootstrap if your local SSH config suppresses password prompts. - Update the Spark before debugging model serving if the driver is broken.
- Docker group membership requires a fresh login.
- NGC is optional for the Ollama path, but useful later for NVIDIA-optimized serving containers.