tool_choice: required (the thing that makes Image Studio reliably fire
its tools) also blocks Open WebUI's background text-only calls — title
generation, tag suggestions, autocomplete — because the model is
forced to produce a tool call instead of text. Result: chats stay
named 'New Chat' and tag suggestions go silent.
Documented the fix in two places:
- image_studio.md: dedicated 'Set a separate Task Model (required
after install)' section explaining the cause and the fix path.
- deployment README §9: short follow-up note pointing at it so
operators don't miss it during initial setup.
The fix is purely Open WebUI configuration — no code change. Pick any
non-Image-Studio model already pulled (mistral-nemo:12b is the
obvious default) for the Task Model slot.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
363 lines
16 KiB
Markdown
363 lines
16 KiB
Markdown
# ai-stack — deployment
|
|
|
|
The full multi-service stack: Caddy (TLS + reverse proxy) in front of Open
|
|
WebUI (chat + image generation panel), Ollama (LLMs), and ComfyUI (image
|
|
generation), with an optional Anubis PoW anti-bot sidecar. One GPU host,
|
|
one bridge network, one TLS entry point.
|
|
|
|
This is the only supported deployment shape — sanitized snapshot of the
|
|
production `srvno.de` deployment.
|
|
|
|
## Files
|
|
|
|
| File | Purpose |
|
|
| --------------------------------------- | -------------------------------------------------------- |
|
|
| `docker-compose.yml` | Service definitions, volumes, GPU reservations |
|
|
| `Caddyfile` | TLS + reverse proxy config (one site block per hostname) |
|
|
| `init-models.sh` | LLMs to preseed into Ollama on first boot |
|
|
| `mirror-ollama-model.sh` | Helper — mirror an Ollama model into a tarball you can host on S3 |
|
|
| `comfyui-init-models.sh` | Checkpoints/VAEs/LoRAs to preseed into ComfyUI on first boot |
|
|
| `openwebui-tools/smart_image_gen.py` | Tool that auto-routes image generation, img2img, and text-targeted inpainting to the right SDXL checkpoint |
|
|
| `openwebui-models/image_studio.md` | Dedicated chat-model preset — manual setup walkthrough |
|
|
| `openwebui-models/image_studio.json` | The same preset as an importable Open WebUI model JSON |
|
|
| `.env.example` | Secrets and image-tag pins. Copy to `.env` |
|
|
|
|
## 1. Host prerequisites
|
|
|
|
- Linux (or WSL2) with an NVIDIA GPU and a recent driver.
|
|
- cu126 wheels (default Dockerfile): driver >= 545
|
|
- cu130 wheels (swap in Dockerfile): driver >= 580
|
|
- Docker Engine + Compose v2.
|
|
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
|
|
installed and the Docker runtime configured (`nvidia-ctk runtime configure
|
|
--runtime=docker && systemctl restart docker`).
|
|
- DNS for the chat / ComfyUI hostnames already pointing at this host
|
|
(Caddy needs working DNS to provision Let's Encrypt certs on first boot).
|
|
|
|
Confirm GPU passthrough works before bringing the stack up:
|
|
|
|
```sh
|
|
docker run --rm --gpus all nvidia/cuda:12.6.3-base-ubuntu24.04 nvidia-smi
|
|
```
|
|
|
|
## 2. Configure
|
|
|
|
```sh
|
|
cp .env.example .env
|
|
# generate the two keys with: openssl rand -hex 32
|
|
```
|
|
|
|
Then edit:
|
|
|
|
- **`.env`** — fill in:
|
|
- `WEBUI_URL` (full URL with scheme) and `LLM_URL` (bare hostname). Both
|
|
point at the same Open WebUI host; Open WebUI wants the URL form for
|
|
auth redirects, Anubis wants the bare hostname for its cookie domain.
|
|
- `WEBUI_SECRET_KEY` and (if using Anubis) `ANUBIS_OWUI_KEY` —
|
|
`openssl rand -hex 32` for each.
|
|
- Optionally pin `COMFYUI_IMAGE_TAG` to a specific `v*` release.
|
|
- **`Caddyfile`** — replace the `chat.example.com` and `comfyui.example.com`
|
|
hostnames with yours; replace `REPLACE_WITH_BCRYPT_HASH` with a real
|
|
bcrypt hash:
|
|
|
|
```sh
|
|
docker run --rm caddy:latest caddy hash-password --plaintext 'your-password'
|
|
```
|
|
- **`init-models.sh`** — keep the LLMs you want preseeded, drop the rest.
|
|
Check sizes at <https://ollama.com/library> first; the host needs disk
|
|
for everything listed. Two pull paths are available:
|
|
- `pull "<model:tag>"` — standard registry pull from
|
|
`registry.ollama.ai`.
|
|
- `s3_pull "<model:tag>" "<archive.tgz>"` — fetches from your own
|
|
mirror set via `S3_OLLAMA_BASE` in `.env`. Falls back to
|
|
`ollama pull` if the env var isn't set, so this is safe to enable
|
|
incrementally. Create the tarballs once with
|
|
`mirror-ollama-model.sh` (see [Mirroring models to S3](#mirroring-models-to-s3)).
|
|
- **`comfyui-init-models.sh`** — checkpoints/VAEs/LoRAs to preseed into
|
|
ComfyUI. Ships empty (no active fetches) — uncomment the SDXL/Flux/
|
|
upscaler examples or add your own. Whatever filename you pick should
|
|
match the `ckpt_name` field in `workflows/*.json` (default expects
|
|
`CyberRealisticXLPlay_V8.0_FP16.safetensors`). Set `HF_TOKEN` in
|
|
`.env` if any are gated repos.
|
|
|
|
## 3. Bring it up
|
|
|
|
```sh
|
|
docker compose up -d
|
|
docker compose logs -f
|
|
```
|
|
|
|
First boot: Caddy provisions Let's Encrypt certs, the `model-init`
|
|
container pulls the LLMs in `init-models.sh` (slow — `mistral-nemo:12b`
|
|
alone is ~7 GB), and ComfyUI initialises empty volumes.
|
|
|
|
Health-check:
|
|
|
|
```sh
|
|
docker compose exec comfyui curl -sf http://127.0.0.1:8188/system_stats | head -c 200
|
|
docker compose exec open-webui curl -sf http://127.0.0.1:8080/health
|
|
```
|
|
|
|
## 4. ComfyUI checkpoints
|
|
|
|
ComfyUI ships no models. Three ways to get one in:
|
|
|
|
1. **Preseed via the sidecar (default).** `comfyui-model-init` runs once
|
|
on `compose up`, downloads everything `comfyui-init-models.sh` lists,
|
|
and exits. The script ships empty — uncomment one of the examples or
|
|
add your own `fetch` calls (SDXL, Flux, LoRAs, upscalers, etc.). At
|
|
least one checkpoint should be named
|
|
`CyberRealisticXLPlay_V8.0_FP16.safetensors` to match the workflow
|
|
default, or update `ckpt_name` in `workflows/*.json` to whatever you
|
|
pull. Re-run with `docker compose up -d comfyui-model-init` after
|
|
script edits; already-present files are skipped.
|
|
2. **ComfyUI-Manager UI.** Open `https://comfyui.example.com` (after
|
|
basic-auth login), click **Manager**, then **Model Manager**, install
|
|
from the catalogue.
|
|
3. **Direct copy into the volume.** Useful if you already have the file
|
|
locally:
|
|
|
|
```sh
|
|
docker run --rm -v ai-stack_comfyui-models:/models -v $PWD:/src alpine \
|
|
cp /src/your-model.safetensors /models/checkpoints/
|
|
```
|
|
|
|
## 5. First-user signup in Open WebUI
|
|
|
|
Open `https://chat.example.com`. The first account created becomes the
|
|
admin. Subsequent signups land in `pending` and need admin approval (set
|
|
by `DEFAULT_USER_ROLE: pending` in compose).
|
|
|
|
## 6. Wire Open WebUI to ComfyUI
|
|
|
|
Open WebUI ships the ComfyUI integration but won't know which workflow to
|
|
submit until you paste one in. Do this once for txt2img, once for img2img.
|
|
|
|
In Open WebUI: **Admin Panel -> Settings -> Images**.
|
|
|
|
1. **Image Generation Engine** -> `ComfyUI` (preselected via env var).
|
|
2. **ComfyUI Base URL** -> `http://comfyui:8188` (preselected).
|
|
3. **ComfyUI Workflow** -> paste the entire contents of
|
|
[`../../workflows/txt2img.json`](../../workflows/txt2img.json).
|
|
4. **ComfyUI Workflow Nodes** -> paste the contents of
|
|
[`../../workflows/txt2img.nodes.json`](../../workflows/txt2img.nodes.json).
|
|
5. **Default Model** -> the filename of the checkpoint you dropped in
|
|
step 4 (e.g. `CyberRealisticXLPlay_V8.0_FP16.safetensors`).
|
|
6. Save.
|
|
|
|
For image editing (img2img), scroll to the **Image Editing** section in
|
|
the same panel and repeat with
|
|
[`../../workflows/img2img.json`](../../workflows/img2img.json) and
|
|
[`../../workflows/img2img.nodes.json`](../../workflows/img2img.nodes.json).
|
|
|
|
## 7. Test it
|
|
|
|
In any chat, click the image-generation button and prompt for an image.
|
|
Open WebUI submits the workflow to ComfyUI; the result drops back into
|
|
the chat when KSampler finishes. To test img2img, attach an image and
|
|
use the edit action.
|
|
|
|
## 8. (Optional) Install the smart-routing Tool
|
|
|
|
The image-button path always uses the admin's **Default Model**. To get
|
|
per-prompt checkpoint routing — e.g. "draw me a cyberpunk city" picks
|
|
CyberRealistic, "anthro fox warrior" picks one of the furry checkpoints —
|
|
install the `smart_image_gen.py` Tool. It exposes two methods the LLM
|
|
calls:
|
|
|
|
- **`generate_image`** for new images from scratch (txt2img).
|
|
- **`edit_image`** for modifying an image the user attached to the
|
|
chat. Two modes:
|
|
- With `mask_text` — text-targeted inpainting via GroundingDINO+SAM
|
|
(e.g. "the dog's collar"). Only the named region is repainted.
|
|
- Without `mask_text` — full img2img which reimagines the whole
|
|
image at the requested denoise.
|
|
|
|
Both auto-route to the right SDXL checkpoint per request.
|
|
|
|
> **First inpaint takes a few minutes**: SAM-HQ (~2.5 GB) and
|
|
> GroundingDINO (~700 MB) auto-download into the `comfyui-models`
|
|
> volume on the very first call to `edit_image` with `mask_text`.
|
|
> Subsequent inpaints are instant.
|
|
|
|
1. **Workspace -> Tools -> +** (top-right).
|
|
2. Paste the contents of
|
|
[`openwebui-tools/smart_image_gen.py`](openwebui-tools/smart_image_gen.py).
|
|
3. Save. Optionally adjust the Valves (ComfyUI URL, default steps, CFG,
|
|
timeout) via the gear icon.
|
|
4. **Workspace -> Models** (or pick an existing chat model) -> edit ->
|
|
under **Tools**, enable `smart_image_gen` -> save.
|
|
5. Make sure the model has **native function calling** enabled
|
|
(Workspace -> Models -> the model -> Advanced Params -> Function
|
|
Calling: Native). Mistral, Qwen, and Llama 3.1+ all support this.
|
|
|
|
In a chat with that model, ask for an image — "make me a photoreal
|
|
portrait of a cyberpunk samurai" — the LLM should call
|
|
`generate_image(prompt=..., style="photo")`. The status bar shows
|
|
"Routing to photo (CyberRealisticXLPlay…)" while it generates.
|
|
|
|
If the LLM responds in text instead of calling the tool, install the
|
|
**Image Studio** chat-model preset (next section) — a dedicated model
|
|
with a system prompt that removes the ambiguity.
|
|
|
|
## 9. (Recommended) Install the Image Studio model preset
|
|
|
|
General-purpose chat models often "describe" an image in text instead
|
|
of firing the `generate_image` tool, especially on conversational
|
|
phrasing ("can you draw me…", "I'd love a picture of…"). The
|
|
**Image Studio** preset wraps `mistral-nemo:12b` in a system prompt
|
|
that mandates tool use — every message is treated as an image request.
|
|
|
|
Setup — two paths:
|
|
|
|
- **Import the JSON** (fast): Workspace → Models → Import →
|
|
[`openwebui-models/image_studio.json`](openwebui-models/image_studio.json).
|
|
- **Manual** (full control): walkthrough in
|
|
[`openwebui-models/image_studio.md`](openwebui-models/image_studio.md).
|
|
|
|
Users then pick **Image Studio** from the chat-model dropdown when
|
|
they want to generate or edit images.
|
|
|
|
**One required follow-up** after either install path: set a separate
|
|
**Task Model** in Admin Settings → Interface → Task Model. Image
|
|
Studio uses `tool_choice: required` to force tool calls, which means
|
|
the same model can't produce the text responses Open WebUI needs for
|
|
chat-title generation, tag suggestions, and autocomplete. Pick any
|
|
non-Image-Studio model you have pulled (`mistral-nemo:12b`,
|
|
`llama3.1:8b`, etc.) — see the
|
|
[**Set a separate Task Model** section in image_studio.md](openwebui-models/image_studio.md#set-a-separate-task-model-required-after-install).
|
|
|
|
The preset ships with `vision: true` so users can attach images for
|
|
editing even though `mistral-nemo:12b` isn't a vision model — see the
|
|
[**Vision capability** section in image_studio.md](openwebui-models/image_studio.md#vision-capability)
|
|
for the trade-offs and the upgrade path to a real vision LLM
|
|
(`qwen2.5vl:7b`, `llama3.2-vision:11b`, etc.) if the LLM needs to
|
|
actually see the image to write smarter edit instructions.
|
|
|
|
To extend (new checkpoint, new style):
|
|
|
|
- Add the filename to `comfyui-init-models.sh` so it gets pulled.
|
|
- Add a key to the `CHECKPOINTS` dict in `smart_image_gen.py`.
|
|
- Optionally add style-specific negatives to `NEGATIVES`.
|
|
- Optionally add keyword routing rules to `ROUTING_RULES` for the
|
|
auto-detect path.
|
|
- Re-paste the Tool source in Workspace -> Tools.
|
|
|
|
## Mirroring models to S3
|
|
|
|
For models you want to pin against upstream changes (or pull faster
|
|
from your own infra), mirror them to S3 once and have the
|
|
deployment fetch from there.
|
|
|
|
### Create the mirror tarball
|
|
|
|
Run [`mirror-ollama-model.sh`](mirror-ollama-model.sh) on any machine
|
|
that has the model pulled locally. It reads `~/.ollama/models/`,
|
|
pulls the manifest's referenced blobs, and tars everything together:
|
|
|
|
```sh
|
|
./mirror-ollama-model.sh huihui_ai/qwen3.5-abliterated:9b qwen3.5-abliterated-9b.tgz
|
|
```
|
|
|
|
### Upload to S3
|
|
|
|
Whatever fits — `aws s3 cp`, `mc`, `rclone`, etc. The bucket needs
|
|
to expose the file over HTTPS (public-read ACL on the object, a
|
|
CloudFront distribution, R2 with public URLs, etc.):
|
|
|
|
```sh
|
|
aws s3 cp qwen3.5-abliterated-9b.tgz s3://your-bucket/ollama-models/ --acl public-read
|
|
```
|
|
|
|
### Wire the deployment to fetch from there
|
|
|
|
In `.env`:
|
|
|
|
```
|
|
S3_OLLAMA_BASE=https://your-bucket.s3.amazonaws.com/ollama-models
|
|
```
|
|
|
|
In `init-models.sh`, switch the affected models from `pull` to
|
|
`s3_pull`:
|
|
|
|
```sh
|
|
s3_pull "huihui_ai/qwen3.5-abliterated:9b" "qwen3.5-abliterated-9b.tgz"
|
|
```
|
|
|
|
`docker compose up -d model-init` re-runs the init container; the
|
|
script downloads the tarball, extracts into the `ollama-data` volume,
|
|
and the running Ollama daemon picks it up on its next manifest scan.
|
|
|
|
If `S3_OLLAMA_BASE` isn't set, `s3_pull` transparently falls back to
|
|
`ollama pull` — safe to commit `s3_pull` lines without S3 ready yet.
|
|
|
|
## Enabling Anubis (later)
|
|
|
|
The `anubis-owui` service is defined in compose but no Caddy site block
|
|
points at it yet. To activate:
|
|
|
|
1. Generate a key: `openssl rand -hex 32` and set `ANUBIS_OWUI_KEY` in `.env`.
|
|
2. In `Caddyfile`, change `reverse_proxy open-webui:8080` to
|
|
`reverse_proxy anubis-owui:8923` for the chat hostname.
|
|
3. `docker compose up -d`.
|
|
|
|
## How the workflow node mappings work
|
|
|
|
Open WebUI doesn't introspect the workflow graph. The `*.nodes.json`
|
|
files tell it which node IDs and input fields to overwrite when the user
|
|
provides a prompt, image, seed, etc. Each entry:
|
|
|
|
```json
|
|
{ "type": "<placeholder>", "node_ids": ["<id>"], "key": "<input field>" }
|
|
```
|
|
|
|
Recognised `type` strings (per Open WebUI source): `model`, `prompt`,
|
|
`negative_prompt`, `width`, `height`, `n` (batch size), `steps`, `seed`,
|
|
and `image` (img2img / edit only). Notably **not** mappable: sampler,
|
|
scheduler, CFG, CLIP skip, prompt prefix.
|
|
|
|
This means the static workflow JSONs are tuned for a single checkpoint
|
|
family at a time. The shipped defaults match
|
|
`CyberRealisticXLPlay_V8.0_FP16.safetensors`
|
|
(`dpmpp_2m_sde` / `karras` / CFG 4 / 28 steps / CLIP skip 1 / no prefix).
|
|
**If you change the admin's Default Model to a different checkpoint
|
|
family** (Pony, NoobAI, Illustrious, etc.), edit the workflow JSONs:
|
|
|
|
- `KSampler` node: change `sampler_name`, `scheduler`, `cfg`, `steps`
|
|
- For checkpoints needing CLIP skip 2: add a `CLIPSetLastLayer` node and
|
|
rewire `CLIPTextEncode` nodes through it (see
|
|
[openwebui-tools/smart_image_gen.py](openwebui-tools/smart_image_gen.py)
|
|
for the exact graph).
|
|
- For Pony or NoobAI/Illustrious: the required quality-tag prefix
|
|
(`score_9, score_8_up, ...` or `masterpiece, best quality, ...`) has
|
|
to be typed by the user every time, since the workflow can't inject
|
|
it. **For multi-checkpoint deployments, use the smart_image_gen Tool
|
|
instead** — it handles per-checkpoint sampler / CFG / steps / CLIP
|
|
skip / prefix automatically based on the LLM's `style` choice.
|
|
|
|
If you swap in a fancier workflow (Flux, ControlNet, NL masking via
|
|
SAM nodes, etc.), update the matching `*.nodes.json` so the node IDs
|
|
and input keys still line up.
|
|
|
|
## Common gotchas
|
|
|
|
- **"Model not found" in Open WebUI's image panel.** ComfyUI lists
|
|
models from `/opt/comfyui/models/checkpoints/`. Confirm the file is
|
|
there and that **Default Model** matches the filename exactly
|
|
(including extension).
|
|
- **Out-of-memory on first generate.** Lower `IMAGE_SIZE` in compose
|
|
(e.g. `768x768`) or pass `--lowvram` / `--medvram` in the Dockerfile
|
|
CMD and rebuild.
|
|
- **Custom nodes need extra pip packages.** Install via ComfyUI-Manager
|
|
(it pip-installs into the container's venv). The `custom_nodes`
|
|
volume persists, but `/opt/venv` does not — so packages installed by
|
|
the manager survive container restarts only because the manager
|
|
re-installs them on boot. For permanent custom-node deps, add a
|
|
`RUN pip install …` to the Dockerfile and rebuild.
|
|
- **GPU not visible inside container.** Re-run the `nvidia-smi` test in
|
|
step 1. If it fails, the toolkit is misconfigured.
|
|
- **Caddy can't get a cert.** First-boot ACME requires DNS A/AAAA
|
|
records pointing at this host's public IP and ports 80+443 reachable
|
|
from the internet. Check `docker compose logs caddy` for the specific
|
|
challenge failure.
|