Files
comfyui-nvidia/deployments/ai-stack
William Gill d4e2058859 smart_image_gen v0.3: add edit_image (img2img) method
The Tool now exposes two methods the LLM picks between based on whether
the user attached an image:

  generate_image — txt2img (existing, unchanged behavior)
  edit_image     — img2img on the most recently attached image

edit_image extracts the source image from __messages__ (base64 data
URIs in image_url content blocks) or __files__ (local path or URL),
uploads to ComfyUI's /upload/image, runs an img2img workflow at the
caller-specified denoise (default 0.7), and returns the edited result.
Same per-style routing / sampler / CFG / prefix logic as generation.

Refactored the submit-and-poll loop into _submit_and_fetch shared by
both methods. Image extraction is defensive — tries messages first,
then files (path then URL), returns a clear "no image attached"
message rather than silently generating from scratch.

Image Studio system prompt rewritten to teach the LLM when to call
edit_image vs generate_image and how to pick denoise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:59:13 -05:00
..

ai-stack — deployment

The full multi-service stack: Caddy (TLS + reverse proxy) in front of Open WebUI (chat + image generation panel), Ollama (LLMs), and ComfyUI (image generation), with an optional Anubis PoW anti-bot sidecar. One GPU host, one bridge network, one TLS entry point.

This is the only supported deployment shape — sanitized snapshot of the production srvno.de deployment.

Files

File Purpose
docker-compose.yml Service definitions, volumes, GPU reservations
Caddyfile TLS + reverse proxy config (one site block per hostname)
init-models.sh LLMs to preseed into Ollama on first boot
comfyui-init-models.sh Checkpoints/VAEs/LoRAs to preseed into ComfyUI on first boot
openwebui-tools/smart_image_gen.py Tool that auto-routes image generation AND editing to the right SDXL checkpoint
openwebui-models/image_studio.md Dedicated chat-model preset — system prompt that forces tool use
.env.example Secrets and image-tag pins. Copy to .env

1. Host prerequisites

  • Linux (or WSL2) with an NVIDIA GPU and a recent driver.
    • cu126 wheels (default Dockerfile): driver >= 545
    • cu130 wheels (swap in Dockerfile): driver >= 580
  • Docker Engine + Compose v2.
  • NVIDIA Container Toolkit installed and the Docker runtime configured (nvidia-ctk runtime configure --runtime=docker && systemctl restart docker).
  • DNS for the chat / ComfyUI hostnames already pointing at this host (Caddy needs working DNS to provision Let's Encrypt certs on first boot).

Confirm GPU passthrough works before bringing the stack up:

docker run --rm --gpus all nvidia/cuda:12.6.3-base-ubuntu24.04 nvidia-smi

2. Configure

cp .env.example .env
# generate the two keys with: openssl rand -hex 32

Then edit:

  • .env — fill in:

    • WEBUI_URL (full URL with scheme) and LLM_URL (bare hostname). Both point at the same Open WebUI host; Open WebUI wants the URL form for auth redirects, Anubis wants the bare hostname for its cookie domain.
    • WEBUI_SECRET_KEY and (if using Anubis) ANUBIS_OWUI_KEYopenssl rand -hex 32 for each.
    • Optionally pin COMFYUI_IMAGE_TAG to a specific v* release.
  • Caddyfile — replace the chat.example.com and comfyui.example.com hostnames with yours; replace REPLACE_WITH_BCRYPT_HASH with a real bcrypt hash:

    docker run --rm caddy:latest caddy hash-password --plaintext 'your-password'
    
  • init-models.sh — keep the LLMs you want preseeded, drop the rest. Check sizes at https://ollama.com/library first; the host needs disk for everything listed.

  • comfyui-init-models.sh — checkpoints/VAEs/LoRAs to preseed into ComfyUI. Ships empty (no active fetches) — uncomment the SDXL/Flux/ upscaler examples or add your own. Whatever filename you pick should match the ckpt_name field in workflows/*.json (default expects CyberRealisticXLPlay_V8.0_FP16.safetensors). Set HF_TOKEN in .env if any are gated repos.

3. Bring it up

docker compose up -d
docker compose logs -f

First boot: Caddy provisions Let's Encrypt certs, the model-init container pulls the LLMs in init-models.sh (slow — mistral-nemo:12b alone is ~7 GB), and ComfyUI initialises empty volumes.

Health-check:

docker compose exec comfyui curl -sf http://127.0.0.1:8188/system_stats | head -c 200
docker compose exec open-webui curl -sf http://127.0.0.1:8080/health

4. ComfyUI checkpoints

ComfyUI ships no models. Three ways to get one in:

  1. Preseed via the sidecar (default). comfyui-model-init runs once on compose up, downloads everything comfyui-init-models.sh lists, and exits. The script ships empty — uncomment one of the examples or add your own fetch calls (SDXL, Flux, LoRAs, upscalers, etc.). At least one checkpoint should be named CyberRealisticXLPlay_V8.0_FP16.safetensors to match the workflow default, or update ckpt_name in workflows/*.json to whatever you pull. Re-run with docker compose up -d comfyui-model-init after script edits; already-present files are skipped.

  2. ComfyUI-Manager UI. Open https://comfyui.example.com (after basic-auth login), click Manager, then Model Manager, install from the catalogue.

  3. Direct copy into the volume. Useful if you already have the file locally:

    docker run --rm -v ai-stack_comfyui-models:/models -v $PWD:/src alpine \
        cp /src/your-model.safetensors /models/checkpoints/
    

5. First-user signup in Open WebUI

Open https://chat.example.com. The first account created becomes the admin. Subsequent signups land in pending and need admin approval (set by DEFAULT_USER_ROLE: pending in compose).

6. Wire Open WebUI to ComfyUI

Open WebUI ships the ComfyUI integration but won't know which workflow to submit until you paste one in. Do this once for txt2img, once for img2img.

In Open WebUI: Admin Panel -> Settings -> Images.

  1. Image Generation Engine -> ComfyUI (preselected via env var).
  2. ComfyUI Base URL -> http://comfyui:8188 (preselected).
  3. ComfyUI Workflow -> paste the entire contents of ../../workflows/txt2img.json.
  4. ComfyUI Workflow Nodes -> paste the contents of ../../workflows/txt2img.nodes.json.
  5. Default Model -> the filename of the checkpoint you dropped in step 4 (e.g. CyberRealisticXLPlay_V8.0_FP16.safetensors).
  6. Save.

For image editing (img2img), scroll to the Image Editing section in the same panel and repeat with ../../workflows/img2img.json and ../../workflows/img2img.nodes.json.

7. Test it

In any chat, click the image-generation button and prompt for an image. Open WebUI submits the workflow to ComfyUI; the result drops back into the chat when KSampler finishes. To test img2img, attach an image and use the edit action.

8. (Optional) Install the smart-routing Tool

The image-button path always uses the admin's Default Model. To get per-prompt checkpoint routing — e.g. "draw me a cyberpunk city" picks CyberRealistic, "anthro fox warrior" picks one of the furry checkpoints — install the smart_image_gen.py Tool. It exposes two methods the LLM calls:

  • generate_image for new images from scratch (txt2img).
  • edit_image for modifying an image the user attached to the chat (img2img).

Both auto-route to the right SDXL checkpoint per request.

  1. Workspace -> Tools -> + (top-right).
  2. Paste the contents of openwebui-tools/smart_image_gen.py.
  3. Save. Optionally adjust the Valves (ComfyUI URL, default steps, CFG, timeout) via the gear icon.
  4. Workspace -> Models (or pick an existing chat model) -> edit -> under Tools, enable smart_image_gen -> save.
  5. Make sure the model has native function calling enabled (Workspace -> Models -> the model -> Advanced Params -> Function Calling: Native). Mistral, Qwen, and Llama 3.1+ all support this.

In a chat with that model, ask for an image — "make me a photoreal portrait of a cyberpunk samurai" — the LLM should call generate_image(prompt=..., style="photo"). The status bar shows "Routing to photo (CyberRealisticXLPlay…)" while it generates.

If the LLM responds in text instead of calling the tool, install the Image Studio chat-model preset (next section) — a dedicated model with a system prompt that removes the ambiguity.

General-purpose chat models often "describe" an image in text instead of firing the generate_image tool, especially on conversational phrasing ("can you draw me…", "I'd love a picture of…"). The Image Studio preset wraps mistral-nemo:12b in a system prompt that mandates tool use — every message is treated as an image request.

Setup (under 5 minutes): see openwebui-models/image_studio.md. Users then pick Image Studio from the chat-model dropdown when they want to generate.

To extend (new checkpoint, new style):

  • Add the filename to comfyui-init-models.sh so it gets pulled.
  • Add a key to the CHECKPOINTS dict in smart_image_gen.py.
  • Optionally add style-specific negatives to NEGATIVES.
  • Optionally add keyword routing rules to ROUTING_RULES for the auto-detect path.
  • Re-paste the Tool source in Workspace -> Tools.

Enabling Anubis (later)

The anubis-owui service is defined in compose but no Caddy site block points at it yet. To activate:

  1. Generate a key: openssl rand -hex 32 and set ANUBIS_OWUI_KEY in .env.
  2. In Caddyfile, change reverse_proxy open-webui:8080 to reverse_proxy anubis-owui:8923 for the chat hostname.
  3. docker compose up -d.

How the workflow node mappings work

Open WebUI doesn't introspect the workflow graph. The *.nodes.json files tell it which node IDs and input fields to overwrite when the user provides a prompt, image, seed, etc. Each entry:

{ "type": "<placeholder>", "node_ids": ["<id>"], "key": "<input field>" }

Recognised type strings (per Open WebUI source): model, prompt, negative_prompt, width, height, n (batch size), steps, seed, and image (img2img / edit only). Notably not mappable: sampler, scheduler, CFG, CLIP skip, prompt prefix.

This means the static workflow JSONs are tuned for a single checkpoint family at a time. The shipped defaults match CyberRealisticXLPlay_V8.0_FP16.safetensors (dpmpp_2m_sde / karras / CFG 4 / 28 steps / CLIP skip 1 / no prefix). If you change the admin's Default Model to a different checkpoint family (Pony, NoobAI, Illustrious, etc.), edit the workflow JSONs:

  • KSampler node: change sampler_name, scheduler, cfg, steps
  • For checkpoints needing CLIP skip 2: add a CLIPSetLastLayer node and rewire CLIPTextEncode nodes through it (see openwebui-tools/smart_image_gen.py for the exact graph).
  • For Pony or NoobAI/Illustrious: the required quality-tag prefix (score_9, score_8_up, ... or masterpiece, best quality, ...) has to be typed by the user every time, since the workflow can't inject it. For multi-checkpoint deployments, use the smart_image_gen Tool instead — it handles per-checkpoint sampler / CFG / steps / CLIP skip / prefix automatically based on the LLM's style choice.

If you swap in a fancier workflow (Flux, ControlNet, NL masking via SAM nodes, etc.), update the matching *.nodes.json so the node IDs and input keys still line up.

Common gotchas

  • "Model not found" in Open WebUI's image panel. ComfyUI lists models from /opt/comfyui/models/checkpoints/. Confirm the file is there and that Default Model matches the filename exactly (including extension).
  • Out-of-memory on first generate. Lower IMAGE_SIZE in compose (e.g. 768x768) or pass --lowvram / --medvram in the Dockerfile CMD and rebuild.
  • Custom nodes need extra pip packages. Install via ComfyUI-Manager (it pip-installs into the container's venv). The custom_nodes volume persists, but /opt/venv does not — so packages installed by the manager survive container restarts only because the manager re-installs them on boot. For permanent custom-node deps, add a RUN pip install … to the Dockerfile and rebuild.
  • GPU not visible inside container. Re-run the nvidia-smi test in step 1. If it fails, the toolkit is misconfigured.
  • Caddy can't get a cert. First-boot ACME requires DNS A/AAAA records pointing at this host's public IP and ports 80+443 reachable from the internet. Check docker compose logs caddy for the specific challenge failure.