comfyui-nvidia/deployments/ai-stack/README.md

# ai-stack — deployment

The full multi-service stack: Caddy (TLS + reverse proxy) in front of Open
WebUI (chat + image generation panel), Ollama (LLMs), and ComfyUI (image
generation), with an optional Anubis PoW anti-bot sidecar. One GPU host,
one bridge network, one TLS entry point.

This is the only supported deployment shape — sanitized snapshot of the
production `srvno.de` deployment.

## Files

| File                                    | Purpose                                                  |
| --------------------------------------- | -------------------------------------------------------- |
| `docker-compose.yml`                    | Service definitions, volumes, GPU reservations           |
| `Caddyfile`                             | TLS + reverse proxy config (one site block per hostname) |
| `init-models.sh`                        | LLMs to preseed into Ollama on first boot                |
| `mirror-ollama-model.sh`                | Helper — mirror an Ollama model into a tarball you can host on S3 |
| `comfyui-init-models.sh`                | Checkpoints/VAEs/LoRAs to preseed into ComfyUI on first boot |
| `openwebui-tools/smart_image_gen.py`    | Tool that auto-routes image generation, img2img, and text-targeted inpainting to the right SDXL checkpoint |
| `openwebui-models/image_studio.md`      | Dedicated chat-model preset — manual setup walkthrough               |
| `openwebui-models/image_studio.json`    | The same preset as an importable Open WebUI model JSON               |
| `.env.example`                          | Secrets and image-tag pins. Copy to `.env`               |

## 1. Host prerequisites

- Linux (or WSL2) with an NVIDIA GPU and a recent driver.
  - cu126 wheels (default Dockerfile): driver >= 545
  - cu130 wheels (swap in Dockerfile): driver >= 580
- Docker Engine + Compose v2.
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
  installed and the Docker runtime configured (`nvidia-ctk runtime configure
  --runtime=docker && systemctl restart docker`).
- DNS for the chat / ComfyUI hostnames already pointing at this host
  (Caddy needs working DNS to provision Let's Encrypt certs on first boot).

Confirm GPU passthrough works before bringing the stack up:

```sh
docker run --rm --gpus all nvidia/cuda:12.6.3-base-ubuntu24.04 nvidia-smi
```

## 2. Configure

```sh
cp .env.example .env
# generate the two keys with: openssl rand -hex 32
```

Then edit:

- **`.env`** — fill in:
    - `WEBUI_URL` (full URL with scheme) and `LLM_URL` (bare hostname). Both
      point at the same Open WebUI host; Open WebUI wants the URL form for
      auth redirects, Anubis wants the bare hostname for its cookie domain.
    - `WEBUI_SECRET_KEY` and (if using Anubis) `ANUBIS_OWUI_KEY` —
      `openssl rand -hex 32` for each.
    - Optionally pin `COMFYUI_IMAGE_TAG` to a specific `v*` release.
- **`Caddyfile`** — replace the `chat.example.com` and `comfyui.example.com`
  hostnames with yours; replace `REPLACE_WITH_BCRYPT_HASH` with a real
  bcrypt hash:

  ```sh
  docker run --rm caddy:latest caddy hash-password --plaintext 'your-password'
  ```
- **`init-models.sh`** — keep the LLMs you want preseeded, drop the rest.
  Check sizes at <https://ollama.com/library> first; the host needs disk
  for everything listed. Two pull paths are available:
    - `pull "<model:tag>"` — standard registry pull from
      `registry.ollama.ai`.
    - `s3_pull "<model:tag>" "<archive.tgz>"` — fetches from your own
      mirror set via `S3_OLLAMA_BASE` in `.env`. Falls back to
      `ollama pull` if the env var isn't set, so this is safe to enable
      incrementally. Create the tarballs once with
      `mirror-ollama-model.sh` (see [Mirroring models to S3](#mirroring-models-to-s3)).
- **`comfyui-init-models.sh`** — checkpoints/VAEs/LoRAs to preseed into
  ComfyUI. Ships empty (no active fetches) — uncomment the SDXL/Flux/
  upscaler examples or add your own. Whatever filename you pick should
  match the `ckpt_name` field in `workflows/*.json` (default expects
  `CyberRealisticXLPlay_V8.0_FP16.safetensors`). Set `HF_TOKEN` in
  `.env` if any are gated repos.

## 3. Bring it up

```sh
docker compose up -d
docker compose logs -f
```

First boot: Caddy provisions Let's Encrypt certs, the `model-init`
container pulls the LLMs in `init-models.sh` (slow — `mistral-nemo:12b`
alone is ~7 GB), and ComfyUI initialises empty volumes.

Health-check:

```sh
docker compose exec comfyui curl -sf http://127.0.0.1:8188/system_stats | head -c 200
docker compose exec open-webui curl -sf http://127.0.0.1:8080/health
```

## 4. ComfyUI checkpoints

ComfyUI ships no models. Three ways to get one in:

1. **Preseed via the sidecar (default).** `comfyui-model-init` runs once
   on `compose up`, downloads everything `comfyui-init-models.sh` lists,
   and exits. The script ships empty — uncomment one of the examples or
   add your own `fetch` calls (SDXL, Flux, LoRAs, upscalers, etc.). At
   least one checkpoint should be named
   `CyberRealisticXLPlay_V8.0_FP16.safetensors` to match the workflow
   default, or update `ckpt_name` in `workflows/*.json` to whatever you
   pull. Re-run with `docker compose up -d comfyui-model-init` after
   script edits; already-present files are skipped.
2. **ComfyUI-Manager UI.** Open `https://comfyui.example.com` (after
   basic-auth login), click **Manager**, then **Model Manager**, install
   from the catalogue.
3. **Direct copy into the volume.** Useful if you already have the file
   locally:

   ```sh
   docker run --rm -v ai-stack_comfyui-models:/models -v $PWD:/src alpine \
       cp /src/your-model.safetensors /models/checkpoints/
   ```

## 5. First-user signup in Open WebUI

Open `https://chat.example.com`. The first account created becomes the
admin. Subsequent signups land in `pending` and need admin approval (set
by `DEFAULT_USER_ROLE: pending` in compose).

## 6. Wire Open WebUI to ComfyUI

Open WebUI ships the ComfyUI integration but won't know which workflow to
submit until you paste one in. Do this once for txt2img, once for img2img.

In Open WebUI: **Admin Panel -> Settings -> Images**.

1. **Image Generation Engine** -> `ComfyUI` (preselected via env var).
2. **ComfyUI Base URL** -> `http://comfyui:8188` (preselected).
3. **ComfyUI Workflow** -> paste the entire contents of
   [`../../workflows/txt2img.json`](../../workflows/txt2img.json).
4. **ComfyUI Workflow Nodes** -> paste the contents of
   [`../../workflows/txt2img.nodes.json`](../../workflows/txt2img.nodes.json).
5. **Default Model** -> the filename of the checkpoint you dropped in
   step 4 (e.g. `CyberRealisticXLPlay_V8.0_FP16.safetensors`).
6. Save.

For image editing (img2img), scroll to the **Image Editing** section in
the same panel and repeat with
[`../../workflows/img2img.json`](../../workflows/img2img.json) and
[`../../workflows/img2img.nodes.json`](../../workflows/img2img.nodes.json).

## 7. Test it

In any chat, click the image-generation button and prompt for an image.
Open WebUI submits the workflow to ComfyUI; the result drops back into
the chat when KSampler finishes. To test img2img, attach an image and
use the edit action.

## 8. (Optional) Install the smart-routing Tool

The image-button path always uses the admin's **Default Model**. To get
per-prompt checkpoint routing — e.g. "draw me a cyberpunk city" picks
CyberRealistic, "anthro fox warrior" picks one of the furry checkpoints —
install the `smart_image_gen.py` Tool. It exposes two methods the LLM
calls:

- **`generate_image`** for new images from scratch (txt2img).
- **`edit_image`** for modifying an image the user attached to the
  chat. Two modes:
    - With `mask_text` — text-targeted inpainting via GroundingDINO+SAM
      (e.g. "the dog's collar"). Only the named region is repainted.
    - Without `mask_text` — full img2img which reimagines the whole
      image at the requested denoise.

Both auto-route to the right SDXL checkpoint per request.

> **First inpaint takes a few minutes**: SAM-HQ (~2.5 GB) and
> GroundingDINO (~700 MB) auto-download into the `comfyui-models`
> volume on the very first call to `edit_image` with `mask_text`.
> Subsequent inpaints are instant.

1. **Workspace -> Tools -> +** (top-right).
2. Paste the contents of
   [`openwebui-tools/smart_image_gen.py`](openwebui-tools/smart_image_gen.py).
3. Save. Optionally adjust the Valves (ComfyUI URL, default steps, CFG,
   timeout) via the gear icon.
4. **Workspace -> Models** (or pick an existing chat model) -> edit ->
   under **Tools**, enable `smart_image_gen` -> save.
5. Make sure the model has **native function calling** enabled
   (Workspace -> Models -> the model -> Advanced Params -> Function
   Calling: Native). Mistral, Qwen, and Llama 3.1+ all support this.

In a chat with that model, ask for an image — "make me a photoreal
portrait of a cyberpunk samurai" — the LLM should call
`generate_image(prompt=..., style="photo")`. The status bar shows
"Routing to photo (CyberRealisticXLPlay…)" while it generates.

If the LLM responds in text instead of calling the tool, install the
**Image Studio** chat-model preset (next section) — a dedicated model
with a system prompt that removes the ambiguity.

## 9. (Recommended) Install the Image Studio model preset

General-purpose chat models often "describe" an image in text instead
of firing the `generate_image` tool, especially on conversational
phrasing ("can you draw me…", "I'd love a picture of…"). The
**Image Studio** preset wraps `mistral-nemo:12b` in a system prompt
that mandates tool use — every message is treated as an image request.

Setup — two paths:

- **Import the JSON** (fast): Workspace → Models → Import →
  [`openwebui-models/image_studio.json`](openwebui-models/image_studio.json).
- **Manual** (full control): walkthrough in
  [`openwebui-models/image_studio.md`](openwebui-models/image_studio.md).

Users then pick **Image Studio** from the chat-model dropdown when
they want to generate or edit images.

**One required follow-up** after either install path: set a separate
**Task Model** in Admin Settings → Interface → Task Model. Image
Studio uses `tool_choice: required` to force tool calls, which means
the same model can't produce the text responses Open WebUI needs for
chat-title generation, tag suggestions, and autocomplete. Pick any
non-Image-Studio model you have pulled (`mistral-nemo:12b`,
`llama3.1:8b`, etc.) — see the
[**Set a separate Task Model** section in image_studio.md](openwebui-models/image_studio.md#set-a-separate-task-model-required-after-install).

The preset ships with `vision: true` so users can attach images for
editing even though `mistral-nemo:12b` isn't a vision model — see the
[**Vision capability** section in image_studio.md](openwebui-models/image_studio.md#vision-capability)
for the trade-offs and the upgrade path to a real vision LLM
(`qwen2.5vl:7b`, `llama3.2-vision:11b`, etc.) if the LLM needs to
actually see the image to write smarter edit instructions.

To extend (new checkpoint, new style):

- Add the filename to `comfyui-init-models.sh` so it gets pulled.
- Add a key to the `CHECKPOINTS` dict in `smart_image_gen.py`.
- Optionally add style-specific negatives to `NEGATIVES`.
- Optionally add keyword routing rules to `ROUTING_RULES` for the
  auto-detect path.
- Re-paste the Tool source in Workspace -> Tools.

## Mirroring models to S3

For models you want to pin against upstream changes (or pull faster
from your own infra), mirror them to S3 once and have the
deployment fetch from there.

### Create the mirror tarball

Run [`mirror-ollama-model.sh`](mirror-ollama-model.sh) on any machine
that has the model pulled locally. It reads `~/.ollama/models/`,
pulls the manifest's referenced blobs, and tars everything together:

```sh
./mirror-ollama-model.sh huihui_ai/qwen3.5-abliterated:9b qwen3.5-abliterated-9b.tgz
```

### Upload to S3

Whatever fits — `aws s3 cp`, `mc`, `rclone`, etc. The bucket needs
to expose the file over HTTPS (public-read ACL on the object, a
CloudFront distribution, R2 with public URLs, etc.):

```sh
aws s3 cp qwen3.5-abliterated-9b.tgz s3://your-bucket/ollama-models/ --acl public-read
```

### Wire the deployment to fetch from there

In `.env`:

```
S3_OLLAMA_BASE=https://your-bucket.s3.amazonaws.com/ollama-models
```

In `init-models.sh`, switch the affected models from `pull` to
`s3_pull`:

```sh
s3_pull "huihui_ai/qwen3.5-abliterated:9b" "qwen3.5-abliterated-9b.tgz"
```

`docker compose up -d model-init` re-runs the init container; the
script downloads the tarball, extracts into the `ollama-data` volume,
and the running Ollama daemon picks it up on its next manifest scan.

If `S3_OLLAMA_BASE` isn't set, `s3_pull` transparently falls back to
`ollama pull` — safe to commit `s3_pull` lines without S3 ready yet.

## Enabling Anubis (later)

The `anubis-owui` service is defined in compose but no Caddy site block
points at it yet. To activate:

1. Generate a key: `openssl rand -hex 32` and set `ANUBIS_OWUI_KEY` in `.env`.
2. In `Caddyfile`, change `reverse_proxy open-webui:8080` to
   `reverse_proxy anubis-owui:8923` for the chat hostname.
3. `docker compose up -d`.

## How the workflow node mappings work

Open WebUI doesn't introspect the workflow graph. The `*.nodes.json`
files tell it which node IDs and input fields to overwrite when the user
provides a prompt, image, seed, etc. Each entry:

```json
{ "type": "<placeholder>", "node_ids": ["<id>"], "key": "<input field>" }
```

Recognised `type` strings (per Open WebUI source): `model`, `prompt`,
`negative_prompt`, `width`, `height`, `n` (batch size), `steps`, `seed`,
and `image` (img2img / edit only). Notably **not** mappable: sampler,
scheduler, CFG, CLIP skip, prompt prefix.

This means the static workflow JSONs are tuned for a single checkpoint
family at a time. The shipped defaults match
`CyberRealisticXLPlay_V8.0_FP16.safetensors`
(`dpmpp_2m_sde` / `karras` / CFG 4 / 28 steps / CLIP skip 1 / no prefix).
**If you change the admin's Default Model to a different checkpoint
family** (Pony, NoobAI, Illustrious, etc.), edit the workflow JSONs:

- `KSampler` node: change `sampler_name`, `scheduler`, `cfg`, `steps`
- For checkpoints needing CLIP skip 2: add a `CLIPSetLastLayer` node and
  rewire `CLIPTextEncode` nodes through it (see
  [openwebui-tools/smart_image_gen.py](openwebui-tools/smart_image_gen.py)
  for the exact graph).
- For Pony or NoobAI/Illustrious: the required quality-tag prefix
  (`score_9, score_8_up, ...` or `masterpiece, best quality, ...`) has
  to be typed by the user every time, since the workflow can't inject
  it. **For multi-checkpoint deployments, use the smart_image_gen Tool
  instead** — it handles per-checkpoint sampler / CFG / steps / CLIP
  skip / prefix automatically based on the LLM's `style` choice.

If you swap in a fancier workflow (Flux, ControlNet, NL masking via
SAM nodes, etc.), update the matching `*.nodes.json` so the node IDs
and input keys still line up.

## Common gotchas

- **"Model not found" in Open WebUI's image panel.** ComfyUI lists
  models from `/opt/comfyui/models/checkpoints/`. Confirm the file is
  there and that **Default Model** matches the filename exactly
  (including extension).
- **Out-of-memory on first generate.** Lower `IMAGE_SIZE` in compose
  (e.g. `768x768`) or pass `--lowvram` / `--medvram` in the Dockerfile
  CMD and rebuild.
- **Custom nodes need extra pip packages.** Install via ComfyUI-Manager
  (it pip-installs into the container's venv). The `custom_nodes`
  volume persists, but `/opt/venv` does not — so packages installed by
  the manager survive container restarts only because the manager
  re-installs them on boot. For permanent custom-node deps, add a
  `RUN pip install …` to the Dockerfile and rebuild.
- **GPU not visible inside container.** Re-run the `nvidia-smi` test in
  step 1. If it fails, the toolkit is misconfigured.
- **Caddy can't get a cert.** First-boot ACME requires DNS A/AAAA
  records pointing at this host's public IP and ports 80+443 reachable
  from the internet. Check `docker compose logs caddy` for the specific
  challenge failure.