Add text-targeted inpainting via GroundingDINO+SAM (mask_text param)

Five pieces: 1. Dockerfile installs storyicon/comfyui_segment_anything (GroundingDINO + SAM-HQ in one bundle) into custom_nodes and pip-installs its requirements at build time. Model weights auto-download to the comfyui-models volume on first inpaint (~3 GB one-time cost). 2. install-custom-node-deps.sh — entrypoint wrapper that pip-installs requirements.txt for any custom_node present at startup. Lets users add custom nodes via ComfyUI-Manager (or by git-cloning into the volume) and have the deps picked up on the next restart, without editing the Dockerfile. 3. smart_image_gen v0.6: edit_image gains a `mask_text` param. When set, builds an inpainting workflow (LoadImage → GroundingDinoSAM Segment → SetLatentNoiseMask → KSampler) so only the named region is repainted. When unset, falls through to the existing img2img path. Denoise default switches: 1.0 with mask_text (full repaint within mask), 0.7 without. 4. Image Studio system prompt teaches the LLM the LOCAL vs GLOBAL distinction — set mask_text whenever the user names a specific object/region ('the ball', 'the dog', 'the sky'); leave it unset only for whole-image style/lighting transformations. 5. Deployment README documents the new mode + the first-inpaint weight-download caveat. Image rebuild required — bump tag to pick up the Dockerfile change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Image Studio: bake tool_choice=required into the preset
2026-04-19 14:43:52 -05:00 · 2026-04-19 14:30:49 -05:00 · 2026-04-19 14:22:13 -05:00 · 2026-04-19 14:18:49 -05:00 · 2026-04-19 13:43:26 -05:00 · 2026-04-19 13:31:17 -05:00
13 changed files with 1476 additions and 47 deletions
--- a/16
+++ b/16
@@ -52,8 +52,24 @@ RUN git clone --depth 1 https://github.com/ltdrdata/ComfyUI-Manager.git \
        ${COMFYUI_HOME}/custom_nodes/ComfyUI-Manager && \
    pip install -r ${COMFYUI_HOME}/custom_nodes/ComfyUI-Manager/requirements.txt

+# comfyui_segment_anything — GroundingDINO + SAM-HQ in one bundle. Required
+# by the smart_image_gen Tool's text-targeted inpainting (edit_image with the
+# mask_text parameter). Model weights auto-download on first use into
+# /opt/comfyui/models/{sams,grounding-dino}/ — first inpaint takes ~3 GB of
+# downloads, subsequent runs are instant.
+RUN git clone --depth 1 https://github.com/storyicon/comfyui_segment_anything.git \
+        ${COMFYUI_HOME}/custom_nodes/comfyui_segment_anything && \
+    pip install -q -r ${COMFYUI_HOME}/custom_nodes/comfyui_segment_anything/requirements.txt
+
+# Entrypoint wrapper — auto-installs requirements.txt for any custom_node
+# present at startup (covers Manager-installed nodes and nodes cloned
+# directly into the comfyui-custom-nodes volume).
+COPY install-custom-node-deps.sh /usr/local/bin/install-custom-node-deps.sh
+RUN chmod +x /usr/local/bin/install-custom-node-deps.sh
+
 EXPOSE 8188

 # --listen 0.0.0.0 binds to every interface so the Open WebUI container on the
 # shared compose network can reach it. --port is explicit for clarity.
+ENTRYPOINT ["/usr/local/bin/install-custom-node-deps.sh"]
 CMD ["python", "main.py", "--listen", "0.0.0.0", "--port", "8188"]
--- a/deployments/ai-stack/.env.example
+++ b/deployments/ai-stack/.env.example
@@ -22,3 +22,16 @@ ANUBIS_OWUI_KEY=replace-with-32-byte-hex
 # ComfyUI image tag to deploy. `latest` tracks whatever the release workflow
 # last pushed; pin to a v* tag (e.g. 0.1.0) for reproducible deploys.
 COMFYUI_IMAGE_TAG=latest
+
+# HuggingFace access token. Only needed if comfyui-init-models.sh references
+# gated repos (Flux-dev, SD3, etc.). Generate a read token at
+# https://huggingface.co/settings/tokens. Leave empty for public-only.
+HF_TOKEN=
+
+# HTTPS base URL of an S3 bucket / CDN that hosts mirrored Ollama model
+# tarballs (created by mirror-ollama-model.sh). Files under this base are
+# fetched by init-models.sh's s3_pull instead of registry.ollama.ai —
+# faster and immune to upstream rate-limiting / removal. Example:
+#   S3_OLLAMA_BASE=https://your-bucket.s3.amazonaws.com/ollama-models
+# Leave empty to fall back to plain `ollama pull` for everything.
+S3_OLLAMA_BASE=
--- a/deployments/ai-stack/README.md
+++ b/deployments/ai-stack/README.md
@@ -10,12 +10,17 @@ production `srvno.de` deployment.

 ## Files

-| File                | Purpose                                                  |
-| ------------------- | -------------------------------------------------------- |
-| `docker-compose.yml`| Service definitions, volumes, GPU reservations           |
-| `Caddyfile`         | TLS + reverse proxy config (one site block per hostname) |
-| `init-models.sh`    | Models to preseed into Ollama on first boot              |
-| `.env.example`      | Secrets and image-tag pins. Copy to `.env`               |
+| File                                    | Purpose                                                  |
+| --------------------------------------- | -------------------------------------------------------- |
+| `docker-compose.yml`                    | Service definitions, volumes, GPU reservations           |
+| `Caddyfile`                             | TLS + reverse proxy config (one site block per hostname) |
+| `init-models.sh`                        | LLMs to preseed into Ollama on first boot                |
+| `mirror-ollama-model.sh`                | Helper — mirror an Ollama model into a tarball you can host on S3 |
+| `comfyui-init-models.sh`                | Checkpoints/VAEs/LoRAs to preseed into ComfyUI on first boot |
+| `openwebui-tools/smart_image_gen.py`    | Tool that auto-routes image generation, img2img, and text-targeted inpainting to the right SDXL checkpoint |
+| `openwebui-models/image_studio.md`      | Dedicated chat-model preset — manual setup walkthrough               |
+| `openwebui-models/image_studio.json`    | The same preset as an importable Open WebUI model JSON               |
+| `.env.example`                          | Secrets and image-tag pins. Copy to `.env`               |

 ## 1. Host prerequisites

@@ -60,7 +65,20 @@ Then edit:
  ```
 - **`init-models.sh`** — keep the LLMs you want preseeded, drop the rest.
  Check sizes at <https://ollama.com/library> first; the host needs disk
-  for everything listed.
+  for everything listed. Two pull paths are available:
+    - `pull "<model:tag>"` — standard registry pull from
+      `registry.ollama.ai`.
+    - `s3_pull "<model:tag>" "<archive.tgz>"` — fetches from your own
+      mirror set via `S3_OLLAMA_BASE` in `.env`. Falls back to
+      `ollama pull` if the env var isn't set, so this is safe to enable
+      incrementally. Create the tarballs once with
+      `mirror-ollama-model.sh` (see [Mirroring models to S3](#mirroring-models-to-s3)).
+- **`comfyui-init-models.sh`** — checkpoints/VAEs/LoRAs to preseed into
+  ComfyUI. Ships empty (no active fetches) — uncomment the SDXL/Flux/
+  upscaler examples or add your own. Whatever filename you pick should
+  match the `ckpt_name` field in `workflows/*.json` (default expects
+  `CyberRealisticXLPlay_V8.0_FP16.safetensors`). Set `HF_TOKEN` in
+  `.env` if any are gated repos.

 ## 3. Bring it up

@@ -80,22 +98,29 @@ docker compose exec comfyui curl -sf http://127.0.0.1:8188/system_stats | head -
 docker compose exec open-webui curl -sf http://127.0.0.1:8080/health
 ```

-## 4. Drop in at least one ComfyUI checkpoint
+## 4. ComfyUI checkpoints

-ComfyUI ships no models. The shipped workflow templates reference
-`v1-5-pruned-emaonly.safetensors` as a placeholder; drop any
-SD/SDXL/Flux checkpoint into the `comfyui-models` volume under
-`checkpoints/`:
+ComfyUI ships no models. Three ways to get one in:

-```sh
-docker run --rm -v ai-stack_comfyui-models:/models -w /models/checkpoints \
-    curlimages/curl:latest -L -O \
-    https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors
-```
+1. **Preseed via the sidecar (default).** `comfyui-model-init` runs once
+   on `compose up`, downloads everything `comfyui-init-models.sh` lists,
+   and exits. The script ships empty — uncomment one of the examples or
+   add your own `fetch` calls (SDXL, Flux, LoRAs, upscalers, etc.). At
+   least one checkpoint should be named
+   `CyberRealisticXLPlay_V8.0_FP16.safetensors` to match the workflow
+   default, or update `ckpt_name` in `workflows/*.json` to whatever you
+   pull. Re-run with `docker compose up -d comfyui-model-init` after
+   script edits; already-present files are skipped.
+2. **ComfyUI-Manager UI.** Open `https://comfyui.example.com` (after
+   basic-auth login), click **Manager**, then **Model Manager**, install
+   from the catalogue.
+3. **Direct copy into the volume.** Useful if you already have the file
+   locally:

-Or open the ComfyUI native UI at `https://comfyui.example.com` (after
-basic-auth login), use the **Manager** button (added by ComfyUI-Manager),
-and install one through **Model Manager**.
+   ```sh
+   docker run --rm -v ai-stack_comfyui-models:/models -v $PWD:/src alpine \
+       cp /src/your-model.safetensors /models/checkpoints/
+   ```

 ## 5. First-user signup in Open WebUI

@@ -117,7 +142,7 @@ In Open WebUI: **Admin Panel -> Settings -> Images**.
 4. **ComfyUI Workflow Nodes** -> paste the contents of
   [`../../workflows/txt2img.nodes.json`](../../workflows/txt2img.nodes.json).
 5. **Default Model** -> the filename of the checkpoint you dropped in
-   step 4 (e.g. `v1-5-pruned-emaonly.safetensors`).
+   step 4 (e.g. `CyberRealisticXLPlay_V8.0_FP16.safetensors`).
 6. Save.

 For image editing (img2img), scroll to the **Image Editing** section in
@@ -132,6 +157,131 @@ Open WebUI submits the workflow to ComfyUI; the result drops back into
 the chat when KSampler finishes. To test img2img, attach an image and
 use the edit action.

+## 8. (Optional) Install the smart-routing Tool
+
+The image-button path always uses the admin's **Default Model**. To get
+per-prompt checkpoint routing — e.g. "draw me a cyberpunk city" picks
+CyberRealistic, "anthro fox warrior" picks one of the furry checkpoints —
+install the `smart_image_gen.py` Tool. It exposes two methods the LLM
+calls:
+
+- **`generate_image`** for new images from scratch (txt2img).
+- **`edit_image`** for modifying an image the user attached to the
+  chat. Two modes:
+    - With `mask_text` — text-targeted inpainting via GroundingDINO+SAM
+      (e.g. "the dog's collar"). Only the named region is repainted.
+    - Without `mask_text` — full img2img which reimagines the whole
+      image at the requested denoise.
+
+Both auto-route to the right SDXL checkpoint per request.
+
+> **First inpaint takes a few minutes**: SAM-HQ (~2.5 GB) and
+> GroundingDINO (~700 MB) auto-download into the `comfyui-models`
+> volume on the very first call to `edit_image` with `mask_text`.
+> Subsequent inpaints are instant.
+
+1. **Workspace -> Tools -> +** (top-right).
+2. Paste the contents of
+   [`openwebui-tools/smart_image_gen.py`](openwebui-tools/smart_image_gen.py).
+3. Save. Optionally adjust the Valves (ComfyUI URL, default steps, CFG,
+   timeout) via the gear icon.
+4. **Workspace -> Models** (or pick an existing chat model) -> edit ->
+   under **Tools**, enable `smart_image_gen` -> save.
+5. Make sure the model has **native function calling** enabled
+   (Workspace -> Models -> the model -> Advanced Params -> Function
+   Calling: Native). Mistral, Qwen, and Llama 3.1+ all support this.
+
+In a chat with that model, ask for an image — "make me a photoreal
+portrait of a cyberpunk samurai" — the LLM should call
+`generate_image(prompt=..., style="photo")`. The status bar shows
+"Routing to photo (CyberRealisticXLPlay…)" while it generates.
+
+If the LLM responds in text instead of calling the tool, install the
+**Image Studio** chat-model preset (next section) — a dedicated model
+with a system prompt that removes the ambiguity.
+
+## 9. (Recommended) Install the Image Studio model preset
+
+General-purpose chat models often "describe" an image in text instead
+of firing the `generate_image` tool, especially on conversational
+phrasing ("can you draw me…", "I'd love a picture of…"). The
+**Image Studio** preset wraps `mistral-nemo:12b` in a system prompt
+that mandates tool use — every message is treated as an image request.
+
+Setup — two paths:
+
+- **Import the JSON** (fast): Workspace → Models → Import →
+  [`openwebui-models/image_studio.json`](openwebui-models/image_studio.json).
+- **Manual** (full control): walkthrough in
+  [`openwebui-models/image_studio.md`](openwebui-models/image_studio.md).
+
+Users then pick **Image Studio** from the chat-model dropdown when
+they want to generate or edit images.
+
+The preset ships with `vision: true` so users can attach images for
+editing even though `mistral-nemo:12b` isn't a vision model — see the
+[**Vision capability** section in image_studio.md](openwebui-models/image_studio.md#vision-capability)
+for the trade-offs and the upgrade path to a real vision LLM
+(`qwen2.5vl:7b`, `llama3.2-vision:11b`, etc.) if the LLM needs to
+actually see the image to write smarter edit instructions.
+
+To extend (new checkpoint, new style):
+
+- Add the filename to `comfyui-init-models.sh` so it gets pulled.
+- Add a key to the `CHECKPOINTS` dict in `smart_image_gen.py`.
+- Optionally add style-specific negatives to `NEGATIVES`.
+- Optionally add keyword routing rules to `ROUTING_RULES` for the
+  auto-detect path.
+- Re-paste the Tool source in Workspace -> Tools.
+
+## Mirroring models to S3
+
+For models you want to pin against upstream changes (or pull faster
+from your own infra), mirror them to S3 once and have the
+deployment fetch from there.
+
+### Create the mirror tarball
+
+Run [`mirror-ollama-model.sh`](mirror-ollama-model.sh) on any machine
+that has the model pulled locally. It reads `~/.ollama/models/`,
+pulls the manifest's referenced blobs, and tars everything together:
+
+```sh
+./mirror-ollama-model.sh huihui_ai/qwen3.5-abliterated:9b qwen3.5-abliterated-9b.tgz
+```
+
+### Upload to S3
+
+Whatever fits — `aws s3 cp`, `mc`, `rclone`, etc. The bucket needs
+to expose the file over HTTPS (public-read ACL on the object, a
+CloudFront distribution, R2 with public URLs, etc.):
+
+```sh
+aws s3 cp qwen3.5-abliterated-9b.tgz s3://your-bucket/ollama-models/ --acl public-read
+```
+
+### Wire the deployment to fetch from there
+
+In `.env`:
+
+```
+S3_OLLAMA_BASE=https://your-bucket.s3.amazonaws.com/ollama-models
+```
+
+In `init-models.sh`, switch the affected models from `pull` to
+`s3_pull`:
+
+```sh
+s3_pull "huihui_ai/qwen3.5-abliterated:9b" "qwen3.5-abliterated-9b.tgz"
+```
+
+`docker compose up -d model-init` re-runs the init container; the
+script downloads the tarball, extracts into the `ollama-data` volume,
+and the running Ollama daemon picks it up on its next manifest scan.
+
+If `S3_OLLAMA_BASE` isn't set, `s3_pull` transparently falls back to
+`ollama pull` — safe to commit `s3_pull` lines without S3 ready yet.
+
 ## Enabling Anubis (later)

 The `anubis-owui` service is defined in compose but no Caddy site block
@@ -154,11 +304,31 @@ provides a prompt, image, seed, etc. Each entry:

 Recognised `type` strings (per Open WebUI source): `model`, `prompt`,
 `negative_prompt`, `width`, `height`, `n` (batch size), `steps`, `seed`,
-and `image` (img2img / edit only).
+and `image` (img2img / edit only). Notably **not** mappable: sampler,
+scheduler, CFG, CLIP skip, prompt prefix.

-If you swap in a fancier workflow (SDXL, Flux, ControlNet, custom
-samplers, NL masking via SAM nodes, etc.), update the matching
-`*.nodes.json` so the node IDs and input keys still line up.
+This means the static workflow JSONs are tuned for a single checkpoint
+family at a time. The shipped defaults match
+`CyberRealisticXLPlay_V8.0_FP16.safetensors`
+(`dpmpp_2m_sde` / `karras` / CFG 4 / 28 steps / CLIP skip 1 / no prefix).
+**If you change the admin's Default Model to a different checkpoint
+family** (Pony, NoobAI, Illustrious, etc.), edit the workflow JSONs:
+
+- `KSampler` node: change `sampler_name`, `scheduler`, `cfg`, `steps`
+- For checkpoints needing CLIP skip 2: add a `CLIPSetLastLayer` node and
+  rewire `CLIPTextEncode` nodes through it (see
+  [openwebui-tools/smart_image_gen.py](openwebui-tools/smart_image_gen.py)
+  for the exact graph).
+- For Pony or NoobAI/Illustrious: the required quality-tag prefix
+  (`score_9, score_8_up, ...` or `masterpiece, best quality, ...`) has
+  to be typed by the user every time, since the workflow can't inject
+  it. **For multi-checkpoint deployments, use the smart_image_gen Tool
+  instead** — it handles per-checkpoint sampler / CFG / steps / CLIP
+  skip / prefix automatically based on the LLM's `style` choice.
+
+If you swap in a fancier workflow (Flux, ControlNet, NL masking via
+SAM nodes, etc.), update the matching `*.nodes.json` so the node IDs
+and input keys still line up.

 ## Common gotchas

--- a/deployments/ai-stack/comfyui-init-models.sh
+++ b/deployments/ai-stack/comfyui-init-models.sh
@@ -0,0 +1,68 @@
+#!/bin/sh
+# Preseed ComfyUI's models volume with checkpoints, VAEs, LoRAs, etc.
+# Runs once via the comfyui-model-init service (see docker-compose.yml).
+# Safe to re-run — already-present files are skipped.
+#
+# ComfyUI doesn't have a "pull" command of its own, so this is plain curl
+# against direct download URLs. For HuggingFace, the direct URL is:
+#   https://huggingface.co/<repo>/resolve/main/<file>
+# For gated HF repos (Flux-dev, SD3, etc.), set HF_TOKEN in .env — the
+# script attaches it as a bearer token automatically.
+
+set -e
+
+apk add --no-cache curl >/dev/null
+
+mkdir -p /models/checkpoints /models/vae /models/loras /models/controlnet \
+         /models/clip /models/clip_vision /models/upscale_models /models/embeddings
+
+fetch() {
+    dest="$1"; name="$2"; url="$3"
+    target="/models/$dest/$name"
+
+    if [ -f "$target" ]; then
+        echo "✓ $dest/$name already present"
+        return
+    fi
+
+    echo "→ Downloading $dest/$name…"
+    mkdir -p "/models/$dest"
+
+    if [ -n "$HF_TOKEN" ] && echo "$url" | grep -q huggingface.co; then
+        curl -fL -C - --retry 3 -H "Authorization: Bearer $HF_TOKEN" \
+             -o "$target.partial" "$url"
+    else
+        curl -fL -C - --retry 3 -o "$target.partial" "$url"
+    fi
+    mv "$target.partial" "$target"
+}
+
+# ─── Edit the list below to choose what gets preseeded ──────────────────────
+# Format: fetch <subdir under /models> <filename to save as> <direct URL>
+#
+# No checkpoints are downloaded by default — the deployment ships expecting
+# you to point at your own model mirror or the public examples below.
+# Whatever filename you pick should match the `ckpt_name` field in
+# workflows/txt2img.json and workflows/img2img.json (the shipped default
+# is CyberRealisticXLPlay_V8.0_FP16.safetensors); update either the
+# script or the workflows so they line up.
+
+# Examples — uncomment what you want.
+
+# SDXL Base 1.0 (~6.9 GB)
+# fetch checkpoints sd_xl_base_1.0.safetensors \
+#     https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors
+
+# SDXL VAE (fixes washed-out colours on some SDXL checkpoints)
+# fetch vae sdxl_vae.safetensors \
+#     https://huggingface.co/stabilityai/sdxl-vae/resolve/main/sdxl_vae.safetensors
+
+# Flux.1-dev (~23 GB, gated — needs HF_TOKEN with access to black-forest-labs)
+# fetch checkpoints flux1-dev.safetensors \
+#     https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev.safetensors
+
+# 4x-UltraSharp upscaler
+# fetch upscale_models 4x-UltraSharp.pth \
+#     https://huggingface.co/lokCX/4x-Ultrasharp/resolve/main/4x-UltraSharp.pth
+
+echo "Done."
--- a/deployments/ai-stack/docker-compose.yml
+++ b/deployments/ai-stack/docker-compose.yml
@@ -79,6 +79,10 @@ services:

  # One-shot model puller. Runs after ollama is healthy, pulls whatever
  # init-models.sh lists, exits. `restart: "no"` keeps it from looping.
+  #
+  # Models can come from registry.ollama.ai (default) or your own S3
+  # mirror (set S3_OLLAMA_BASE in .env; create tarballs with
+  # mirror-ollama-model.sh).
  model-init:
    image: ollama/ollama:latest
    container_name: ollama-model-init
@@ -90,6 +94,7 @@ services:
      - ./init-models.sh:/init-models.sh:ro
    environment:
      - OLLAMA_HOST=ollama:11434
+      - S3_OLLAMA_BASE=${S3_OLLAMA_BASE:-}
    entrypoint: ["/bin/sh", "/init-models.sh"]
    restart: "no"

@@ -129,6 +134,23 @@ services:
      retries: 5
      start_period: 120s

+  # One-shot model puller for ComfyUI. Mounts the same models volume,
+  # downloads whatever comfyui-init-models.sh lists, exits. ComfyUI doesn't
+  # need to be running for this — files just land on the volume; ComfyUI
+  # picks them up next time it scans (or on a restart).
+  comfyui-model-init:
+    image: alpine:latest
+    container_name: comfyui-model-init
+    volumes:
+      - comfyui-models:/models
+      - ./comfyui-init-models.sh:/init.sh:ro
+    environment:
+      # Optional — set in .env to download from gated HuggingFace repos
+      # (Flux-dev, SD3, etc.). Leave empty for public-only.
+      HF_TOKEN: "${HF_TOKEN:-}"
+    entrypoint: ["/bin/sh", "/init.sh"]
+    restart: "no"
+
  # ---------------------------------------------------------------------------
  # Open WebUI — multi-user chat.
  # ---------------------------------------------------------------------------
--- a/deployments/ai-stack/init-models.sh
+++ b/deployments/ai-stack/init-models.sh
@@ -3,20 +3,61 @@
 # Runs once via the model-init service (see docker-compose.yml). Safe to
 # re-run — already-present models are skipped.
 #
-# Add or remove tags to taste. The host needs enough disk for everything
-# listed; check sizes at https://ollama.com/library before adding.
+# Two pull paths:
+#   - s3_pull   — fetches a tarball from $S3_OLLAMA_BASE (your own mirror,
+#                 created by mirror-ollama-model.sh) and extracts into
+#                 Ollama's data dir. Faster + immune to upstream changes.
+#                 Falls back to ollama pull if S3_OLLAMA_BASE is unset.
+#   - pull      — standard `ollama pull` against registry.ollama.ai.

 set -e

-MODELS="dolphin3:8b llama3.1:8b ministral-3:8b mistral-nemo:12b qwen3.6:latest"
+# Make sure curl is available — ollama/ollama:latest doesn't always include
+# it, and s3_pull needs it. tar is in the base image.
+if ! command -v curl >/dev/null 2>&1; then
+    apt-get update -qq && apt-get install -y -qq curl ca-certificates >/dev/null
+fi

-for model in $MODELS; do
-  if ollama list | awk 'NR>1 {print $1}' | grep -qx "$model"; then
-    echo "✓ $model already present"
-  else
-    echo "→ Pulling $model…"
-    ollama pull "$model"
-  fi
+S3_OLLAMA_BASE="${S3_OLLAMA_BASE:-}"
+OLLAMA_DATA="/root/.ollama"
+
+s3_pull() {
+    name="$1"; archive="$2"
+    if ollama list 2>/dev/null | awk 'NR>1 {print $1}' | grep -qx "$name"; then
+        echo "✓ $name already present"
+        return
+    fi
+    if [ -z "$S3_OLLAMA_BASE" ]; then
+        echo "→ $name: S3_OLLAMA_BASE unset, falling back to ollama pull"
+        ollama pull "$name"
+        return
+    fi
+    url="${S3_OLLAMA_BASE%/}/$archive"
+    echo "→ Downloading $name from $url…"
+    curl -fL -C - --retry 3 -o "/tmp/$archive" "$url"
+    tar -xzf "/tmp/$archive" -C "$OLLAMA_DATA/models/"
+    rm -f "/tmp/$archive"
+    echo "✓ $name installed (mirror)"
+}
+
+pull() {
+    name="$1"
+    if ollama list 2>/dev/null | awk 'NR>1 {print $1}' | grep -qx "$name"; then
+        echo "✓ $name already present"
+    else
+        echo "→ Pulling $name from registry.ollama.ai…"
+        ollama pull "$name"
+    fi
+}
+
+# ─── S3-mirrored models ─────────────────────────────────────────────────────
+# These live in your own bucket. Create the tarballs once with
+# mirror-ollama-model.sh, upload to S3, then list them here.
+s3_pull "huihui_ai/qwen3.5-abliterated:9b" "qwen3.5-abliterated-9b.tgz"
+
+# ─── Direct registry pulls ──────────────────────────────────────────────────
+for model in dolphin3:8b llama3.1:8b ministral-3:8b mistral-nemo:12b qwen3.6:latest; do
+    pull "$model"
 done

 echo "Done."
--- a/deployments/ai-stack/mirror-ollama-model.sh
+++ b/deployments/ai-stack/mirror-ollama-model.sh
@@ -0,0 +1,66 @@
+#!/bin/bash
+# Mirror an Ollama model into a portable tarball you can upload to S3
+# (or any HTTPS host) and re-fetch via init-models.sh's s3_pull.
+#
+# Run on any machine that already has the model pulled locally — the
+# script reads ~/.ollama/models/, parses the manifest to find the
+# referenced blobs, and tars them together.
+#
+# Usage:   ./mirror-ollama-model.sh <model:tag> <output.tgz>
+# Example: ./mirror-ollama-model.sh huihui_ai/qwen3.5-abliterated:9b qwen3.5-abliterated-9b.tgz
+#
+# Upload the tarball to S3, then add to init-models.sh:
+#   s3_pull "huihui_ai/qwen3.5-abliterated:9b" "qwen3.5-abliterated-9b.tgz"
+# and set S3_OLLAMA_BASE in .env to your bucket's HTTPS base URL.
+
+set -euo pipefail
+
+MODEL="${1:?Usage: $0 <model:tag> <output.tgz>}"
+OUT="${2:?Usage: $0 <model:tag> <output.tgz>}"
+
+OLLAMA_HOME="${OLLAMA_HOME:-$HOME/.ollama}"
+MODELS="$OLLAMA_HOME/models"
+
+if ! ollama list | awk 'NR>1 {print $1}' | grep -qx "$MODEL"; then
+    echo "Model $MODEL not found locally; pulling first..."
+    ollama pull "$MODEL"
+fi
+
+# huihui_ai/qwen3.5-abliterated:9b → manifests/registry.ollama.ai/huihui_ai/qwen3.5-abliterated/9b
+ns_and_name="${MODEL%:*}"
+tag="${MODEL##*:}"
+manifest_rel="manifests/registry.ollama.ai/$ns_and_name/$tag"
+manifest_abs="$MODELS/$manifest_rel"
+
+if [ ! -f "$manifest_abs" ]; then
+    echo "ERROR: manifest not found at $manifest_abs" >&2
+    exit 1
+fi
+
+# Pull every sha256:* digest out of the manifest JSON. Each maps to
+# blobs/sha256-<hex>.
+blob_files=""
+for digest in $(grep -oE 'sha256:[a-f0-9]+' "$manifest_abs" | sort -u); do
+    blob_rel="blobs/${digest/:/-}"
+    if [ ! -f "$MODELS/$blob_rel" ]; then
+        echo "WARNING: missing blob $blob_rel — skipping" >&2
+        continue
+    fi
+    blob_files="$blob_files $blob_rel"
+done
+
+count=$(echo "$blob_files" | wc -w | tr -d ' ')
+echo "Archiving manifest + $count blob(s)..."
+tar -czf "$OUT" -C "$MODELS" "$manifest_rel" $blob_files
+
+size=$(du -h "$OUT" | cut -f1)
+echo "Done: $OUT ($size)"
+echo
+echo "Next:"
+echo "  1. Upload to your bucket, e.g."
+echo "       aws s3 cp $OUT s3://YOUR-BUCKET/ollama-models/ --acl public-read"
+echo "     (or whatever exposes it over HTTPS)"
+echo "  2. Set S3_OLLAMA_BASE in .env to the bucket's HTTPS base, e.g."
+echo "       S3_OLLAMA_BASE=https://YOUR-BUCKET.s3.amazonaws.com/ollama-models"
+echo "  3. Add to init-models.sh:"
+echo "       s3_pull \"$MODEL\" \"$(basename "$OUT")\""
--- a/deployments/ai-stack/openwebui-models/image_studio.json
+++ b/deployments/ai-stack/openwebui-models/image_studio.json
@@ -0,0 +1,35 @@
+[
+  {
+    "id": "image-studio",
+    "base_model_id": "huihui_ai/qwen3.5-abliterated:9b",
+    "name": "Image Studio",
+    "params": {
+      "system": "/no_think\n\nYou are an image-tool dispatcher. You do not respond in prose. Every user message MUST result in exactly one tool call.\n\nROUTING:\n- If the user attached an image → call edit_image\n- Otherwise → call generate_image\n\nFire the tool on the FIRST message, with no preamble. Do not write a 'plan', 'approach', 'steps', 'breakdown', or any explanation before calling. Do not ask clarifying questions. Do not say what you are about to do. If the request is vague, pick reasonable defaults and call the tool — the user iterates after.\n\nSTYLES (pick one):\n  photo         photorealistic photo / portrait / cinematic\n  juggernaut    alternate photoreal — sharper, more saturated\n  pony          anime, cartoon, manga, stylised illustration\n  general       catch-all when nothing else fits\n  furry-nai     anthropomorphic, NAI-trained mix\n  furry-noob    anthropomorphic, NoobAI base\n  furry-il      anthropomorphic, Illustrious base (default for any furry/anthro request)\n\nedit_image has TWO MODES — pick based on whether the change is local or global:\n- LOCAL change (\"change the ball to a basketball\", \"add a hat to the dog\", \"remove the bird\", \"recolor the car red\") → set `mask_text` to a brief noun phrase naming the region (\"the ball\", \"the dog\", \"the bird\", \"the car\"). Only that region is repainted; rest stays pixel-perfect.\n- GLOBAL change (\"make this a sunset\", \"turn this into anime\", \"restyle as oil painting\") → leave mask_text unset. The whole image is reimagined.\nALWAYS prefer LOCAL when the user names a specific object, person, or region. GLOBAL is only for whole-image style/lighting transformations.\n\nDenoise:\n- LOCAL (mask_text set): default 1.0. Drop to 0.6–0.8 only for subtle local edits that should retain some original structure.\n- GLOBAL (no mask_text): default 0.7. Use 0.3–0.5 for subtle restyle, 0.85–1.0 for radical reimagining.\n\nPick style for the DESIRED OUTPUT, not the input image.\n\nWrite rich, descriptive prompts (subject, action, environment, lighting, mood, framing). Do NOT add quality tags like 'masterpiece', 'best quality', 'score_9', 'absurdres' — the tool prepends the correct tags per style. Do NOT set sampler, CFG, steps, scheduler — the tool picks them.\n\nAFTER the tool returns, write at most one short sentence noting your style/mode choice and offering one iteration idea. The image is already shown to the user; do not describe it.",
+      "temperature": 0.5,
+      "top_p": 0.9,
+      "function_calling": "native",
+      "tool_choice": "required"
+    },
+    "meta": {
+      "profile_image_url": "/static/favicon.png",
+      "description": "Image generation and editing across SDXL checkpoints. Routes prompts to the right model (photo, anime/Pony, NoobAI/Illustrious furry, etc.) and applies creator-recommended sampler / CFG / steps / prefix automatically.",
+      "capabilities": {
+        "vision": true,
+        "usage": false,
+        "citations": false
+      },
+      "tags": [
+        { "name": "image-gen" },
+        { "name": "comfyui" }
+      ],
+      "toolIds": ["smart_image_gen"],
+      "suggestion_prompts": [
+        { "content": "Generate a photorealistic portrait of a cyberpunk samurai at dusk." },
+        { "content": "Draw an anthropomorphic fox warrior in stylised anime art." },
+        { "content": "Make a pony-style illustration of a starry forest at night." }
+      ]
+    },
+    "access_control": null,
+    "is_active": true
+  }
+]
--- a/deployments/ai-stack/openwebui-models/image_studio.md
+++ b/deployments/ai-stack/openwebui-models/image_studio.md
@@ -0,0 +1,185 @@
+# Image Studio — dedicated image-generation chat model
+
+A custom Open WebUI model preset that wraps a base LLM with a system
+prompt heavily biased toward calling the `smart_image_gen` tool. Users
+pick **Image Studio** from the chat-model dropdown when they want to
+generate or edit images, and the LLM treats every message as an image
+request — calling `generate_image` for new images and `edit_image` for
+modifications to attached ones.
+
+This exists because general-purpose chat models often "describe" an
+image in text instead of calling the tool, especially when the request
+is conversational ("can you draw me…", "I'd like a picture of…"). A
+dedicated preset removes the ambiguity.
+
+## Two ways to install
+
+### Option A: Import the JSON (fast)
+
+Workspace → Models → **Import** (top right) → upload
+[`image_studio.json`](image_studio.json).
+
+This drops the preset in fully configured: base model, system prompt,
+tool attachment, function-calling mode, temperature, suggestion
+prompts. Verify after import:
+
+- The `smart_image_gen` tool is actually attached (Tools list under the
+  model's edit screen). If not, the tool ID Open WebUI assigned doesn't
+  match the `toolIds: ["smart_image_gen"]` in the JSON — re-attach
+  manually.
+- Base Model is set to `mistral-nemo:12b`. Adjust if you want a
+  different LLM (Qwen3.6 or Llama 3.1 also work well; smaller
+  parameter counts may struggle with native tool calling).
+
+### Option B: Create manually (table below)
+
+**Workspace → Models → +** (top right).
+
+| Field | Value |
+| ----- | ----- |
+| Name | `Image Studio` |
+| Base Model | `huihui_ai/qwen3.5-abliterated:9b` (vision-capable, 256K context, abliterated). Pull via `init-models.sh` first. |
+| Description | `Image generation and routing across SDXL checkpoints.` |
+| System Prompt | Paste the block from [System prompt](#system-prompt) below. |
+| Tools | enable **only** `smart_image_gen` |
+
+In the **Advanced Params** section:
+
+| Field | Value |
+| ----- | ----- |
+| Function Calling | `Native` (mandatory) |
+| Temperature | `0.5` (lower = more reliable tool-calling) |
+| Top P | `0.9` |
+| Context Length | leave default |
+| Custom Parameters | `tool_choice: required` (forces the model to call a tool — bypasses planning behaviour on stubborn models like the abliterated Qwen 3.5) |
+
+Save. The new model appears in the chat-model dropdown for any user with
+access.
+
+## System prompt
+
+```
+/no_think
+
+You are an image-tool dispatcher. You do not respond in prose. Every
+user message MUST result in exactly one tool call.
+
+ROUTING:
+- If the user attached an image → call edit_image
+- Otherwise → call generate_image
+
+Fire the tool on the FIRST message, with no preamble. Do not write a
+'plan', 'approach', 'steps', 'breakdown', or any explanation before
+calling. Do not ask clarifying questions. Do not say what you are
+about to do. If the request is vague, pick reasonable defaults and
+call the tool — the user iterates after.
+
+STYLES (pick one):
+  photo         photorealistic photo / portrait / cinematic
+  juggernaut    alternate photoreal — sharper, more saturated
+  pony          anime, cartoon, manga, stylised illustration
+  general       catch-all when nothing else fits
+  furry-nai     anthropomorphic, NAI-trained mix
+  furry-noob    anthropomorphic, NoobAI base
+  furry-il      anthropomorphic, Illustrious base (default for any
+                furry/anthro request)
+
+edit_image has TWO MODES — pick based on whether the change is local
+or global:
+
+- LOCAL ("change the ball to a basketball", "add a hat to the dog",
+  "remove the bird", "recolor the car red") → set `mask_text` to a
+  brief noun phrase naming the region ("the ball", "the dog", "the
+  bird", "the car"). Only that region is repainted; rest stays
+  pixel-perfect.
+- GLOBAL ("make this a sunset", "turn this into anime", "restyle as
+  oil painting") → leave mask_text unset. The whole image is
+  reimagined.
+
+ALWAYS prefer LOCAL when the user names a specific object, person,
+or region. GLOBAL is only for whole-image style/lighting
+transformations.
+
+Denoise:
+- LOCAL (mask_text set): default 1.0. Drop to 0.6–0.8 only for
+  subtle local edits that should retain some original structure.
+- GLOBAL (no mask_text): default 0.7. Use 0.3–0.5 for subtle
+  restyle, 0.85–1.0 for radical reimagining.
+
+Pick style for the DESIRED OUTPUT, not the input image.
+
+Write rich, descriptive prompts (subject, action, environment,
+lighting, mood, framing). Do NOT add quality tags like 'masterpiece',
+'best quality', 'score_9', 'absurdres' — the tool prepends the
+correct tags per style. Do NOT set sampler, CFG, steps, scheduler —
+the tool picks them.
+
+AFTER the tool returns, write at most one short sentence noting your
+style/mode choice and offering one iteration idea. The image is
+already shown to the user; do not describe it.
+```
+
+The first line `/no_think` disables Qwen 3.x's reasoning phase. If
+your base model isn't Qwen 3, leaving it in is a no-op (other models
+ignore it). Drop it only if it actually causes problems.
+
+## Vision capability
+
+The shipped preset sets `meta.capabilities.vision: true` so Open WebUI
+allows users to attach images to chats with this model. Two paths:
+
+### Default — `huihui_ai/qwen3.5-abliterated:9b`
+
+The shipped preset uses Qwen 3.5 abliterated 9B as the base — vision-
+capable, 256K context, no censorship hedging. Preseed via
+`init-models.sh` (an `s3_pull` line is already in place; see
+[Mirroring models to S3](../README.md#mirroring-models-to-s3) for the
+mirror workflow).
+
+**Important Qwen 3.x quirk:** thinking mode is on by default and
+breaks native function calling — the model "thinks" about how to use
+the tool instead of just calling it. The shipped system prompt starts
+with `/no_think` to suppress this. If the model still plans instead
+of firing the tool, also set `enable_thinking: false` in **Advanced
+Params → Custom Parameters** (API-level enforcement).
+
+### Alternatives
+
+If Qwen 3.5 isn't a fit (size, language preferences, abliteration
+caveats), other vision-capable Ollama tags worth trying:
+
+- `qwen2.5vl:7b` — smaller, no thinking mode, very reliable tool-caller
+- `llama3.2-vision:11b` — Meta's vision variant, ~7 GB
+- `minicpm-v:8b` — fast, capable
+
+To swap, change `base_model_id` in `image_studio.json` (or the Base
+Model field if you imported manually) and pull the model via
+`init-models.sh` or the Open WebUI model UI.
+
+### Non-vision base model
+
+If you'd rather use a text-only LLM (e.g. `mistral-nemo:12b`),
+keep `vision: true` in the preset so Open WebUI still permits image
+attachments; the image flows through to `edit_image` via
+`__messages__` / `__files__` and ComfyUI does the visual work. The
+LLM can't see the image, but for explicit edit instructions ("change
+the background to a sunset") that doesn't matter.
+
+## Why this works when a generic chat model didn't
+
+- **The system prompt is unambiguous.** No room for the model to
+  decide "I'll just describe it in text instead."
+- **Only one tool is attached.** No competing tools to choose between.
+- **Native function calling is mandatory.** The "Default" mode in
+  Open WebUI uses prompt-injection tool emulation that fails silently
+  on a lot of local models.
+- **Lower temperature.** Tool calling is more reliable with less
+  sampling randomness.
+
+## Iterating on the system prompt
+
+If users ask for things you didn't anticipate (specific aspect ratios,
+multi-image batches, particular checkpoints not in the routing rules),
+edit the system prompt above and re-paste into the Workspace → Models
+entry. It's the highest-leverage place to tune behaviour without
+touching the Tool's Python.
--- a/deployments/ai-stack/openwebui-tools/smart_image_gen.py
+++ b/deployments/ai-stack/openwebui-tools/smart_image_gen.py
@@ -0,0 +1,792 @@
+"""
+title: Smart Image Generator & Editor (ComfyUI)
+author: ai-stack
+version: 0.6.0
+description: Generate or edit images via ComfyUI with automatic SDXL
+    checkpoint routing. Two methods — generate_image (txt2img) and
+    edit_image (img2img on the user's most recently attached image). The
+    LLM picks (or auto-detects) the right model — photoreal, Pony
+    score-tag, NoobAI/Illustrious furry, etc. — and each style ships
+    with the creator-recommended sampler, scheduler, CFG, steps, CLIP
+    skip, prompt-prefix dialect, and negatives. The image is uploaded
+    to Open WebUI's file store and surfaced via a `files` event (the
+    canonical pattern used by Open WebUI's own image-gen path); the
+    function return is a short confirmation so the LLM doesn't try to
+    describe or re-emit the image.
+required_open_webui_version: 0.5.0
+"""
+
+import asyncio
+import base64
+import inspect
+import io
+import re
+import time
+import uuid
+from typing import Awaitable, Callable, Literal, Optional
+
+import aiohttp
+from pydantic import BaseModel, Field
+
+# Open WebUI's runtime — only available when the tool is loaded inside the
+# Open WebUI process. Guarded so the module still imports for standalone
+# linting/testing; if the imports fail at runtime, _push_image_to_chat
+# falls back to emitting a markdown data-URI message.
+try:
+    from fastapi import UploadFile
+    from open_webui.models.users import Users
+    from open_webui.routers.files import upload_file_handler
+
+    _OPENWEBUI_RUNTIME = True
+except ImportError:
+    _OPENWEBUI_RUNTIME = False
+
+StyleName = Literal[
+    "photo", "juggernaut", "pony", "general",
+    "furry-nai", "furry-noob", "furry-il",
+]
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Per-style settings — sampler/scheduler/cfg/steps/clip_skip/prefix/negatives
+# come from each model's creator page on Civitai. Three prefix dialects in
+# play: photoreal (no prefix, natural language), Pony score chain (REQUIRED
+# for any Pony-derived checkpoint), and Booru quality tags (NoobAI /
+# Illustrious lineage). Never cross-contaminate.
+# ─────────────────────────────────────────────────────────────────────────────
+
+STYLES = {
+    "photo": {
+        "ckpt":      "CyberRealisticXLPlay_V8.0_FP16.safetensors",
+        "sampler":   "dpmpp_2m_sde",
+        "scheduler": "karras",
+        "cfg":       4.0,
+        "steps":     28,
+        "clip_skip": 1,
+        "prefix":    "",  # natural language only — no quality tags
+        "negative": (
+            "cartoon, drawing, illustration, anime, manga, painting, sketch, "
+            "render, 3d, cgi, watercolor, plastic skin, doll-like, oversaturated, "
+            "lowres, blurry, jpeg artifacts, noisy, grainy, low quality, worst quality, "
+            "bad anatomy, deformed, mutated, extra limbs, extra fingers, missing fingers, "
+            "fused fingers, malformed hands, asymmetric face, "
+            "watermark, signature, text, logo, label, username"
+        ),
+    },
+    "juggernaut": {
+        "ckpt":      "Juggernaut-XL_v9_RunDiffusionPhoto_v2.safetensors",
+        "sampler":   "dpmpp_2m_sde",
+        "scheduler": "karras",
+        "cfg":       4.5,
+        "steps":     35,
+        "clip_skip": 1,
+        "prefix":    "",  # natural language only
+        "negative": (
+            "cartoon, drawing, illustration, anime, manga, painting, sketch, "
+            "render, 3d, cgi, plastic skin, washed out, oversaturated, "
+            "lowres, blurry, jpeg artifacts, low quality, worst quality, "
+            "bad anatomy, deformed, mutated, extra limbs, extra fingers, missing fingers, "
+            "fused fingers, malformed hands, "
+            "watermark, signature, text, logo, username"
+        ),
+    },
+    "pony": {
+        "ckpt":      "ponyDiffusionV6XL_v6StartWithThisOne.safetensors",
+        "sampler":   "euler_ancestral",
+        "scheduler": "normal",
+        "cfg":       7.5,
+        "steps":     25,
+        "clip_skip": 2,
+        # REQUIRED — the full chain. Just `score_9` alone is much weaker.
+        "prefix":    "score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up, ",
+        # Pony's creator notes negatives are usually unnecessary; conservative
+        # baseline only. Source-toggle tags (source_pony/furry/anime/cartoon)
+        # are intentionally omitted — they exclude entire content domains.
+        "negative": (
+            "score_6, score_5, score_4, "
+            "worst quality, low quality, lowres, blurry, jpeg artifacts, noisy, "
+            "bad anatomy, bad proportions, bad hands, extra digit, fewer digits, "
+            "fused fingers, malformed limbs, deformed, ugly, "
+            "censored, monochrome, "
+            "watermark, signature, text, logo, artist name, patreon username, twitter username"
+        ),
+    },
+    "general": {
+        "ckpt":      "talmendoxlSDXL_v11Beta.safetensors",
+        "sampler":   "dpmpp_2m",
+        "scheduler": "karras",
+        "cfg":       8.0,  # Talmendo wants notably higher CFG than the others
+        "steps":     30,
+        "clip_skip": 2,
+        "prefix":    "",  # creator says don't push "masterpiece" — fights the amateur aesthetic
+        "negative": (
+            "lowres, blurry, jpeg artifacts, noisy, grainy, low quality, worst quality, "
+            "bad anatomy, deformed, mutated, extra limbs, missing fingers, fused fingers, "
+            "malformed hands, ugly, "
+            "watermark, signature, text, logo"
+        ),
+    },
+    "furry-nai": {
+        "ckpt":      "reedFURRYMixSDXL_v23nai.safetensors",
+        "sampler":   "euler_ancestral",
+        "scheduler": "normal",
+        "cfg":       5.0,
+        "steps":     30,
+        "clip_skip": 2,
+        "prefix": (
+            "masterpiece, best quality, high quality, good quality, "
+            "detailed eyes, highres, absurdres, furry, "
+        ),
+        "negative": (
+            "human, realistic, photorealistic, 3d, cgi, "
+            "worst quality, bad_quality, normal quality, lowres, "
+            "anatomical nonsense, bad anatomy, interlocked fingers, extra fingers, "
+            "bad_feet, bad_hands, deformed anatomy, bad proportions, "
+            "censored, simple background, transparent, face backlighting, "
+            "watermark, signature, text, logo, username, jpeg artifacts"
+        ),
+    },
+    "furry-noob": {
+        "ckpt":      "indigoVoidFurryFusedXL_noobaiV32.safetensors",
+        "sampler":   "euler_ancestral",  # creator: other samplers won't work
+        "scheduler": "normal",
+        "cfg":       4.5,
+        "steps":     20,
+        "clip_skip": 2,
+        "prefix": (
+            "masterpiece, best quality, perfect quality, absurdres, newest, "
+            "very aesthetic, vibrant colors, "
+        ),
+        "negative": (
+            "human, realistic, photorealistic, 3d, cgi, "
+            "shiny skin, shiny clothing, "
+            "worst quality, low quality, lowres, blurry, jpeg artifacts, noisy, "
+            "bad anatomy, bad hands, mutated hands, bad proportions, "
+            "extra digit, fewer digits, fused fingers, malformed limbs, deformed, ugly, "
+            "watermark, signature, text, logo, username, artist signature"
+        ),
+    },
+    "furry-il": {
+        "ckpt":      "novaFurryXL_ilV170.safetensors",
+        "sampler":   "euler_ancestral",
+        "scheduler": "normal",
+        "cfg":       4.0,
+        "steps":     30,
+        "clip_skip": 2,
+        # Illustrious wants `newest` in positive and `old`/`oldest` in negative
+        # — these are year-bucket tags from the training set. `furry` and
+        # `anthro` are universally helpful here.
+        "prefix": (
+            "masterpiece, best quality, amazing quality, very aesthetic, "
+            "high resolution, ultra-detailed, absurdres, newest, furry, anthro, "
+        ),
+        "negative": (
+            "human, multiple tails, modern, recent, old, oldest, "
+            "graphic, cartoon, painting, crayon, graphite, abstract, glitch, "
+            "deformed, mutated, ugly, disfigured, long body, conjoined, "
+            "lowres, bad anatomy, bad hands, missing fingers, extra digits, fewer digits, "
+            "cropped, very displeasing, worst quality, bad quality, sketch, "
+            "jpeg artifacts, signature, watermark, username, text, simple background, "
+            "bad ai-generated"
+        ),
+    },
+}
+
+DEFAULT_STYLE = "general"
+
+# First-match-wins keyword router used when the caller didn't pick a style.
+# Order matters — narrower patterns above broader ones.
+ROUTING_RULES = [
+    # Pony score chain is the single strongest signal — Pony only
+    (re.compile(r"\bscore_\d", re.I),                                       "pony"),
+    (re.compile(r"\bpony\b",   re.I),                                       "pony"),
+    # NoobAI / Illustrious explicit mentions
+    (re.compile(r"\b(noobai|noob)\b", re.I),                                "furry-noob"),
+    (re.compile(r"\b(illustrious|ilxl)\b", re.I),                           "furry-il"),
+    # Generic furry — defaults to NovaFurry (Illustrious lineage, current sweet spot)
+    (re.compile(r"\b(furry|anthro|feral|kemono|fursona|species)\b", re.I),  "furry-il"),
+    # Photo / photoreal
+    (re.compile(r"\b(juggernaut)\b", re.I),                                 "juggernaut"),
+    (re.compile(r"\b(photo|photograph|realistic|portrait|selfie|cinematic)\b", re.I), "photo"),
+    # Generic anime / illustration → Pony covers anime well
+    (re.compile(r"\b(anime|manga|2d|illustration)\b", re.I),                "pony"),
+]
+
+
+def _route_style(prompt: str) -> str:
+    for pattern, style in ROUTING_RULES:
+        if pattern.search(prompt):
+            return style
+    return DEFAULT_STYLE
+
+
+def _seed_value(seed: int) -> int:
+    return seed if seed > 0 else int(time.time() * 1000) % (2**31)
+
+
+def _build_txt2img(positive: str, negative: str, settings: dict,
+                   width: int, height: int, seed: int) -> dict:
+    """
+    SDXL txt2img workflow. CLIP skip via CLIPSetLastLayer so the same graph
+    handles skip 1 (-1) and skip 2 (-2).
+    """
+    return {
+        "3": {"class_type": "KSampler", "inputs": {
+            "seed": _seed_value(seed),
+            "steps": settings["steps"], "cfg": settings["cfg"],
+            "sampler_name": settings["sampler"], "scheduler": settings["scheduler"],
+            "denoise": 1.0,
+            "model": ["4", 0], "positive": ["6", 0],
+            "negative": ["7", 0], "latent_image": ["5", 0],
+        }},
+        "4": {"class_type": "CheckpointLoaderSimple",
+              "inputs": {"ckpt_name": settings["ckpt"]}},
+        "5": {"class_type": "EmptyLatentImage",
+              "inputs": {"width": width, "height": height, "batch_size": 1}},
+        "6": {"class_type": "CLIPTextEncode", "inputs": {"text": positive, "clip": ["10", 0]}},
+        "7": {"class_type": "CLIPTextEncode", "inputs": {"text": negative, "clip": ["10", 0]}},
+        "8": {"class_type": "VAEDecode", "inputs": {"samples": ["3", 0], "vae": ["4", 2]}},
+        "9": {"class_type": "SaveImage",
+              "inputs": {"filename_prefix": "smartgen", "images": ["8", 0]}},
+        "10": {"class_type": "CLIPSetLastLayer",
+               "inputs": {"stop_at_clip_layer": -settings["clip_skip"],
+                          "clip": ["4", 1]}},
+    }
+
+
+def _build_inpaint(positive: str, negative: str, settings: dict,
+                   image_filename: str, mask_text: str,
+                   denoise: float, seed: int) -> dict:
+    """
+    SDXL inpainting workflow with text-driven masking. Uses
+    comfyui_segment_anything (GroundingDINO + SAM-HQ — installed by the
+    Dockerfile) to derive a mask from `mask_text` (a noun phrase like
+    "the dog's collar"), then SetLatentNoiseMask + KSampler repaint
+    only that region. Everything outside the mask stays pixel-perfect.
+
+    First inpaint downloads ~3 GB of SAM/GroundingDINO weights into
+    /opt/comfyui/models/{sams,grounding-dino}/ — subsequent runs reuse
+    them.
+    """
+    return {
+        "3": {"class_type": "KSampler", "inputs": {
+            "seed": _seed_value(seed),
+            "steps": settings["steps"], "cfg": settings["cfg"],
+            "sampler_name": settings["sampler"], "scheduler": settings["scheduler"],
+            "denoise": denoise,
+            "model": ["4", 0], "positive": ["6", 0],
+            "negative": ["7", 0], "latent_image": ["13", 0],
+        }},
+        "4": {"class_type": "CheckpointLoaderSimple",
+              "inputs": {"ckpt_name": settings["ckpt"]}},
+        "6": {"class_type": "CLIPTextEncode", "inputs": {"text": positive, "clip": ["10", 0]}},
+        "7": {"class_type": "CLIPTextEncode", "inputs": {"text": negative, "clip": ["10", 0]}},
+        "8": {"class_type": "VAEDecode", "inputs": {"samples": ["3", 0], "vae": ["4", 2]}},
+        "9": {"class_type": "SaveImage",
+              "inputs": {"filename_prefix": "smartinpaint", "images": ["8", 0]}},
+        "10": {"class_type": "CLIPSetLastLayer",
+               "inputs": {"stop_at_clip_layer": -settings["clip_skip"],
+                          "clip": ["4", 1]}},
+        "11": {"class_type": "VAEEncode", "inputs": {"pixels": ["12", 0], "vae": ["4", 2]}},
+        "12": {"class_type": "LoadImage", "inputs": {"image": image_filename}},
+        "13": {"class_type": "SetLatentNoiseMask",
+               "inputs": {"samples": ["11", 0], "mask": ["16", 1]}},
+        "14": {"class_type": "SAMModelLoader (segment anything)",
+               "inputs": {"model_name": "sam_hq_vit_h (2.57GB)"}},
+        "15": {"class_type": "GroundingDinoModelLoader (segment anything)",
+               "inputs": {"model_name": "GroundingDINO_SwinT_OGC (694MB)"}},
+        "16": {"class_type": "GroundingDinoSAMSegment (segment anything)",
+               "inputs": {
+                   "sam_model": ["14", 0],
+                   "grounding_dino_model": ["15", 0],
+                   "image": ["12", 0],
+                   "prompt": mask_text,
+                   "threshold": 0.3,
+               }},
+    }
+
+
+def _build_img2img(positive: str, negative: str, settings: dict,
+                   image_filename: str, denoise: float, seed: int) -> dict:
+    """
+    SDXL img2img workflow. Loads `image_filename` (already uploaded to
+    ComfyUI's /input/), VAE-encodes it to latent, and feeds that into the
+    sampler at the requested denoise. Resolution is whatever the source
+    image is — no resize.
+    """
+    return {
+        "3": {"class_type": "KSampler", "inputs": {
+            "seed": _seed_value(seed),
+            "steps": settings["steps"], "cfg": settings["cfg"],
+            "sampler_name": settings["sampler"], "scheduler": settings["scheduler"],
+            "denoise": denoise,
+            "model": ["4", 0], "positive": ["6", 0],
+            "negative": ["7", 0], "latent_image": ["11", 0],
+        }},
+        "4": {"class_type": "CheckpointLoaderSimple",
+              "inputs": {"ckpt_name": settings["ckpt"]}},
+        "6": {"class_type": "CLIPTextEncode", "inputs": {"text": positive, "clip": ["10", 0]}},
+        "7": {"class_type": "CLIPTextEncode", "inputs": {"text": negative, "clip": ["10", 0]}},
+        "8": {"class_type": "VAEDecode", "inputs": {"samples": ["3", 0], "vae": ["4", 2]}},
+        "9": {"class_type": "SaveImage",
+              "inputs": {"filename_prefix": "smartedit", "images": ["8", 0]}},
+        "10": {"class_type": "CLIPSetLastLayer",
+               "inputs": {"stop_at_clip_layer": -settings["clip_skip"],
+                          "clip": ["4", 1]}},
+        "11": {"class_type": "VAEEncode", "inputs": {"pixels": ["12", 0], "vae": ["4", 2]}},
+        "12": {"class_type": "LoadImage", "inputs": {"image": image_filename}},
+    }
+
+
+async def _extract_attached_image(
+    files: Optional[list],
+    messages: Optional[list],
+    session: aiohttp.ClientSession,
+) -> Optional[bytes]:
+    """
+    Find the most recent image the user attached to the chat. Tries three
+    sources in order: (1) base64 data URIs in `image_url` content blocks
+    of the recent messages (works for vision-capable models), (2) a local
+    filesystem path on the file dict (open-webui stores uploads under
+    /app/backend/data/uploads/), (3) the file's url field, fetched over
+    HTTP. Returns raw image bytes, or None if nothing matched.
+    """
+    # Messages: standard OpenAI image_url content blocks.
+    for msg in reversed(messages or []):
+        content = msg.get("content") if isinstance(msg, dict) else None
+        if isinstance(content, list):
+            for block in content:
+                if not isinstance(block, dict) or block.get("type") != "image_url":
+                    continue
+                url = (block.get("image_url") or {}).get("url", "")
+                if url.startswith("data:image"):
+                    try:
+                        return base64.b64decode(url.split(",", 1)[1])
+                    except Exception:
+                        pass
+
+    # Files: try local path, then URL.
+    for f in files or []:
+        if not isinstance(f, dict):
+            continue
+        ftype = (f.get("type") or "").lower()
+        fname = (f.get("name") or f.get("filename") or "").lower()
+        is_image = "image" in ftype or fname.endswith((".png", ".jpg", ".jpeg", ".webp"))
+        if not is_image:
+            continue
+
+        for path_key in ("path", "filepath", "file_path"):
+            path = f.get(path_key)
+            if path:
+                try:
+                    with open(path, "rb") as fh:
+                        return fh.read()
+                except OSError:
+                    pass
+
+        url = f.get("url")
+        if url:
+            full = url if url.startswith("http") else f"http://localhost:8080{url}"
+            try:
+                async with session.get(full) as resp:
+                    if resp.status == 200:
+                        return await resp.read()
+            except aiohttp.ClientError:
+                pass
+
+    return None
+
+
+async def _upload_to_comfyui(
+    session: aiohttp.ClientSession, base: str, raw: bytes
+) -> Optional[str]:
+    """POST raw bytes to ComfyUI /upload/image and return the saved name."""
+    name = f"smartedit_{uuid.uuid4().hex[:12]}.png"
+    form = aiohttp.FormData()
+    form.add_field("image", raw, filename=name, content_type="image/png")
+    form.add_field("overwrite", "true")
+    async with session.post(f"{base}/upload/image", data=form) as resp:
+        if resp.status != 200:
+            return None
+        return (await resp.json()).get("name", name)
+
+
+async def _push_image_to_chat(
+    raw: bytes,
+    filename_prefix: str,
+    request,
+    user_dict: Optional[dict],
+    metadata: Optional[dict],
+    event_emitter: Optional[Callable[[dict], Awaitable[None]]],
+) -> bool:
+    """
+    Surface a generated image in the chat using Open WebUI's canonical
+    pattern: upload the bytes via the internal file store, then emit a
+    `files` event referencing the served URL. This is the same path Open
+    WebUI's own image-generation code uses (utils/middleware.py ~1325).
+
+    Returns True if the image was uploaded and emitted via the files
+    event. Returns False if anything is missing — caller should fall
+    back to a data-URI markdown message in that case.
+    """
+    if not (_OPENWEBUI_RUNTIME and request and user_dict and event_emitter):
+        return False
+
+    try:
+        user = Users.get_user_by_id(user_dict.get("id"))
+        if not user:
+            return False
+
+        upload = UploadFile(
+            file=io.BytesIO(raw),
+            filename=f"{filename_prefix}_{uuid.uuid4().hex[:8]}.png",
+            headers={"content-type": "image/png"},
+        )
+        meta = metadata or {}
+        result = upload_file_handler(
+            request=request,
+            file=upload,
+            metadata={
+                "chat_id":    meta.get("chat_id"),
+                "message_id": meta.get("message_id"),
+            },
+            process=False,
+            user=user,
+        )
+        # upload_file_handler may be sync or async depending on the Open
+        # WebUI version — handle either.
+        if inspect.iscoroutine(result):
+            file_item = await result
+        else:
+            file_item = result
+
+        url = request.app.url_path_for(
+            "get_file_content_by_id", id=file_item.id
+        )
+
+        await event_emitter({
+            "type": "files",
+            "data": {"files": [{"type": "image", "url": url}]},
+        })
+        return True
+    except Exception:
+        # Any failure (signature drift, missing route, etc.) falls back
+        # to the data-URI path in the caller.
+        return False
+
+
+async def _submit_and_fetch(
+    session: aiohttp.ClientSession,
+    base: str,
+    workflow: dict,
+    timeout_seconds: int,
+    emit: Callable[[str, bool], Awaitable[None]],
+    settings: dict,
+) -> tuple[Optional[bytes], Optional[str]]:
+    """Submit a workflow, poll history, fetch the first output image. Returns
+    (image_bytes, error_message)."""
+    client_id = str(uuid.uuid4())
+
+    async with session.post(
+        f"{base}/prompt", json={"prompt": workflow, "client_id": client_id}
+    ) as resp:
+        if resp.status != 200:
+            return None, f"ComfyUI rejected the prompt: {resp.status} {await resp.text()}"
+        prompt_id = (await resp.json()).get("prompt_id")
+        if not prompt_id:
+            return None, "ComfyUI didn't return a prompt_id."
+
+    await emit(
+        f"Sampling — {settings['sampler']}/{settings['scheduler']}, "
+        f"CFG {settings['cfg']}, {settings['steps']} steps", False
+    )
+
+    deadline = time.time() + timeout_seconds
+    output_images: list = []
+    while time.time() < deadline:
+        await asyncio.sleep(1.5)
+        async with session.get(f"{base}/history/{prompt_id}") as resp:
+            if resp.status != 200:
+                continue
+            history = await resp.json()
+        if prompt_id in history:
+            for node_out in history[prompt_id].get("outputs", {}).values():
+                output_images.extend(node_out.get("images", []))
+            if output_images:
+                break
+
+    if not output_images:
+        return None, f"Timed out after {timeout_seconds}s waiting for image."
+
+    img = output_images[0]
+    params = {
+        "filename": img["filename"],
+        "subfolder": img.get("subfolder", ""),
+        "type": img.get("type", "output"),
+    }
+    async with session.get(f"{base}/view", params=params) as resp:
+        if resp.status != 200:
+            return None, f"Failed to fetch image: {resp.status}"
+        return await resp.read(), None
+
+
+class Tools:
+    class Valves(BaseModel):
+        COMFYUI_BASE_URL: str = Field(
+            default="http://comfyui:8188",
+            description="ComfyUI server URL reachable from the open-webui container.",
+        )
+        TIMEOUT_SECONDS: int = Field(
+            default=240,
+            description="Maximum wait for a single generation to complete.",
+        )
+
+    def __init__(self):
+        self.valves = self.Valves()
+
+    async def generate_image(
+        self,
+        prompt: str,
+        style: Optional[StyleName] = None,
+        negative_prompt: Optional[str] = None,
+        width: int = 1024,
+        height: int = 1024,
+        seed: int = 0,
+        __request__=None,
+        __user__: Optional[dict] = None,
+        __metadata__: Optional[dict] = None,
+        __event_emitter__: Optional[Callable[[dict], Awaitable[None]]] = None,
+    ) -> str:
+        """
+        Create a NEW image from scratch and show it to the user. Use this
+        whenever the user asks you to draw, generate, create, make, paint,
+        render, or imagine any visual content — photographs, portraits,
+        characters, scenes, illustrations, anime, drawings — and they have
+        NOT attached an existing image. If they did attach an image and
+        want it modified, use edit_image instead.
+
+        Pick `style` to match what the user wants:
+        - "photo" — photorealistic photographs, portraits, cinematic shots.
+        - "juggernaut" — alternate photoreal style (sharper, more saturated).
+        - "pony" — anime / illustration / cartoon (Pony Diffusion).
+        - "general" — fallback for anything that doesn't fit the others.
+        - "furry-nai" — anthropomorphic characters (NAI-trained mix).
+        - "furry-noob" — anthropomorphic characters (NoobAI base).
+        - "furry-il" — anthropomorphic characters (Illustrious base, default
+          for any "furry" / "anthro" request unless specified otherwise).
+
+        Each style auto-prepends the right quality tags and picks the right
+        sampler / CFG / steps / CLIP skip. Do NOT add tags like
+        "masterpiece" or "score_9" to `prompt` yourself; the tool handles
+        that.
+
+        :param prompt: Plain description of the image (subject, scene,
+            style notes, lighting, etc.). No quality tags.
+        :param style: One of the values above. Omit to auto-detect.
+        :param negative_prompt: Extra terms to exclude. Usually unneeded.
+        :param width: Pixels (default 1024 — SDXL native). For portraits
+            use 832 with height 1216; for landscapes 1216 with height 832.
+        :param height: Pixels (default 1024).
+        :param seed: 0 to randomize, otherwise a specific seed for repeats.
+        :return: Markdown image of the result.
+        """
+        chosen = style or _route_style(prompt)
+        settings = STYLES.get(chosen)
+        if not settings:
+            return f"Unknown style '{chosen}'. Available: {', '.join(STYLES.keys())}"
+
+        async def emit(msg: str, done: bool = False):
+            if __event_emitter__:
+                await __event_emitter__({
+                    "type": "status",
+                    "data": {"description": msg, "done": done},
+                })
+
+        await emit(f"Routing to {chosen} ({settings['ckpt']})")
+
+        positive = f"{settings['prefix']}{prompt}"
+        negative = settings["negative"]
+        if negative_prompt:
+            negative = f"{negative}, {negative_prompt}"
+
+        workflow = _build_txt2img(positive, negative, settings, width, height, seed)
+        base = self.valves.COMFYUI_BASE_URL.rstrip("/")
+
+        async with aiohttp.ClientSession() as session:
+            raw, err = await _submit_and_fetch(
+                session, base, workflow, self.valves.TIMEOUT_SECONDS, emit, settings,
+            )
+        if err:
+            return err
+
+        # Surface the image in the chat. Preferred path uploads to Open
+        # WebUI's file store and emits a `files` event (matches the built-
+        # in image-gen flow). Fallback inlines a data-URI markdown via a
+        # `message` event for environments where the file API isn't
+        # reachable from the tool process.
+        pushed = await _push_image_to_chat(
+            raw, "smartgen", __request__, __user__, __metadata__, __event_emitter__,
+        )
+        if not pushed and __event_emitter__:
+            b64 = base64.b64encode(raw).decode("ascii")
+            await __event_emitter__({
+                "type": "message",
+                "data": {"content": f"![{chosen}](data:image/png;base64,{b64})"},
+            })
+
+        await emit(f"Done — {chosen}", done=True)
+        return (
+            f"Image generated and shown to the user above (style: {chosen}, "
+            f"checkpoint: {settings['ckpt']}). Do NOT describe the image, "
+            f"do NOT repeat any base64 or markdown — the user can see it. "
+            f"You may briefly note your style choice and offer one or two "
+            f"iteration ideas (different style, tighter framing, etc)."
+        )
+
+    async def edit_image(
+        self,
+        edit_instruction: str,
+        style: Optional[StyleName] = None,
+        mask_text: Optional[str] = None,
+        denoise: Optional[float] = None,
+        negative_prompt: Optional[str] = None,
+        seed: int = 0,
+        __request__=None,
+        __user__: Optional[dict] = None,
+        __metadata__: Optional[dict] = None,
+        __files__: Optional[list] = None,
+        __messages__: Optional[list] = None,
+        __event_emitter__: Optional[Callable[[dict], Awaitable[None]]] = None,
+    ) -> str:
+        """
+        Edit, modify, transform, or restyle an image the user has ATTACHED
+        to the chat. Use whenever the user uploads an image and asks to
+        change it. If no image is attached, use generate_image instead.
+
+        TWO MODES — choose based on whether the change is local or global:
+
+        - LOCAL change ("change the ball to a basketball", "make the dog
+          wear a hat", "remove the bird") → set `mask_text` to a brief
+          noun phrase describing the region ("the ball", "the dog", "the
+          bird"). The tool uses GroundingDINO+SAM to find that region
+          automatically and only that area is repainted; the rest of the
+          image stays pixel-perfect.
+
+        - GLOBAL change ("make this a sunset", "turn this into anime",
+          "restyle this as oil painting") → leave `mask_text` unset. The
+          whole image is reimagined via img2img.
+
+        Always prefer LOCAL mode when the user names a specific object,
+        person, or region. GLOBAL mode is for whole-image style/lighting
+        transformations.
+
+        Denoise tuning:
+        - LOCAL (mask_text set): default 1.0 — full repaint within mask.
+          Drop to 0.6–0.8 for subtle local edits that should retain some
+          original structure.
+        - GLOBAL (no mask_text): default 0.7 — moderate edit. Use 0.3–0.5
+          for subtle restyling, 0.85–1.0 for radical reimagining.
+
+        Pick `style` for the DESIRED OUTPUT, not the input image.
+
+        :param edit_instruction: What the changed area should look like.
+            Tool auto-prepends quality tags — don't include those.
+        :param style: One of the StyleName values. Omit to auto-detect.
+        :param mask_text: Noun phrase describing the region to edit. Set
+            for LOCAL changes; omit for GLOBAL.
+        :param denoise: 0.0 = no change, 1.0 = ignore source. Defaults to
+            1.0 with mask_text, 0.7 without.
+        :param negative_prompt: Extra terms to exclude. Usually unneeded.
+        :param seed: 0 to randomize, otherwise specific.
+        :return: Markdown image of the result, or an error if no image is attached.
+        """
+        chosen = style or _route_style(edit_instruction)
+        settings = STYLES.get(chosen)
+        if not settings:
+            return f"Unknown style '{chosen}'. Available: {', '.join(STYLES.keys())}"
+
+        # Denoise default depends on mode: 1.0 (full repaint within mask)
+        # for inpainting, 0.7 for img2img.
+        if denoise is None:
+            denoise = 1.0 if mask_text else 0.7
+        denoise = max(0.0, min(1.0, denoise))
+
+        async def emit(msg: str, done: bool = False):
+            if __event_emitter__:
+                await __event_emitter__({
+                    "type": "status",
+                    "data": {"description": msg, "done": done},
+                })
+
+        base = self.valves.COMFYUI_BASE_URL.rstrip("/")
+
+        async with aiohttp.ClientSession() as session:
+            await emit("Looking for attached image…")
+            raw_in = await _extract_attached_image(__files__, __messages__, session)
+            if raw_in is None:
+                return (
+                    "No image found in the chat. Ask the user to attach the "
+                    "image they want edited (paperclip / drag-drop), or call "
+                    "generate_image instead if they want a new image."
+                )
+
+            await emit("Uploading source to ComfyUI…")
+            uploaded_name = await _upload_to_comfyui(session, base, raw_in)
+            if not uploaded_name:
+                return "Failed to upload source image to ComfyUI."
+
+            mode = "inpaint" if mask_text else "img2img"
+            await emit(
+                f"Routing to {chosen} ({settings['ckpt']}), {mode}, denoise {denoise:.2f}"
+                + (f", mask='{mask_text}'" if mask_text else "")
+            )
+
+            positive = f"{settings['prefix']}{edit_instruction}"
+            negative = settings["negative"]
+            if negative_prompt:
+                negative = f"{negative}, {negative_prompt}"
+
+            if mask_text:
+                workflow = _build_inpaint(
+                    positive=positive,
+                    negative=negative,
+                    settings=settings,
+                    image_filename=uploaded_name,
+                    mask_text=mask_text,
+                    denoise=denoise,
+                    seed=seed,
+                )
+            else:
+                workflow = _build_img2img(
+                    positive=positive,
+                    negative=negative,
+                    settings=settings,
+                    image_filename=uploaded_name,
+                    denoise=denoise,
+                    seed=seed,
+                )
+
+            raw_out, err = await _submit_and_fetch(
+                session, base, workflow, self.valves.TIMEOUT_SECONDS, emit, settings,
+            )
+        if err:
+            return err
+
+        pushed = await _push_image_to_chat(
+            raw_out, "smartedit", __request__, __user__, __metadata__, __event_emitter__,
+        )
+        if not pushed and __event_emitter__:
+            b64 = base64.b64encode(raw_out).decode("ascii")
+            await __event_emitter__({
+                "type": "message",
+                "data": {"content": f"![edit:{chosen}](data:image/png;base64,{b64})"},
+            })
+
+        await emit(f"Done — {chosen} (denoise {denoise:.2f})", done=True)
+        return (
+            f"Edited image shown to the user above (style: {chosen}, "
+            f"checkpoint: {settings['ckpt']}, denoise: {denoise:.2f}). Do NOT "
+            f"describe the image, do NOT repeat any base64 or markdown — the "
+            f"user can see it. You may briefly note your choice and offer "
+            f"iterations (different denoise, alternate style, etc)."
+        )
--- a/install-custom-node-deps.sh
+++ b/install-custom-node-deps.sh
@@ -0,0 +1,21 @@
+#!/bin/sh
+# Entrypoint wrapper. Pip-installs requirements.txt for any custom_node
+# present in /opt/comfyui/custom_nodes/, then exec's the CMD.
+#
+# This makes the container self-healing for custom nodes that get added
+# at runtime — either via ComfyUI-Manager from the web UI, or by
+# git-cloning directly into the comfyui-custom-nodes volume. Pip skips
+# already-satisfied requirements quickly, so the boot-time cost on
+# subsequent restarts is negligible.
+
+set -e
+
+if [ -d /opt/comfyui/custom_nodes ]; then
+    for req in /opt/comfyui/custom_nodes/*/requirements.txt; do
+        [ -f "$req" ] || continue
+        echo "[entrypoint] installing $req"
+        pip install -q -r "$req" || echo "  (install failed — continuing)"
+    done
+fi
+
+exec "$@"
--- a/workflows/img2img.json
+++ b/workflows/img2img.json
@@ -3,10 +3,10 @@
    "class_type": "KSampler",
    "inputs": {
      "seed": 0,
-      "steps": 20,
-      "cfg": 7,
-      "sampler_name": "euler",
-      "scheduler": "normal",
+      "steps": 28,
+      "cfg": 4.0,
+      "sampler_name": "dpmpp_2m_sde",
+      "scheduler": "karras",
      "denoise": 0.75,
      "model": ["4", 0],
      "positive": ["6", 0],
@@ -17,7 +17,7 @@
  "4": {
    "class_type": "CheckpointLoaderSimple",
    "inputs": {
-      "ckpt_name": "v1-5-pruned-emaonly.safetensors"
+      "ckpt_name": "CyberRealisticXLPlay_V8.0_FP16.safetensors"
    }
  },
  "6": {
@@ -30,7 +30,7 @@
  "7": {
    "class_type": "CLIPTextEncode",
    "inputs": {
-      "text": "",
+      "text": "lowres, blurry, jpeg artifacts, watermark, text, signature, bad anatomy, extra limbs, missing fingers, deformed, ugly, low quality, worst quality",
      "clip": ["4", 1]
    }
  },
--- a/workflows/txt2img.json
+++ b/workflows/txt2img.json
@@ -3,10 +3,10 @@
    "class_type": "KSampler",
    "inputs": {
      "seed": 0,
-      "steps": 20,
-      "cfg": 7,
-      "sampler_name": "euler",
-      "scheduler": "normal",
+      "steps": 28,
+      "cfg": 4.0,
+      "sampler_name": "dpmpp_2m_sde",
+      "scheduler": "karras",
      "denoise": 1,
      "model": ["4", 0],
      "positive": ["6", 0],
@@ -17,7 +17,7 @@
  "4": {
    "class_type": "CheckpointLoaderSimple",
    "inputs": {
-      "ckpt_name": "v1-5-pruned-emaonly.safetensors"
+      "ckpt_name": "CyberRealisticXLPlay_V8.0_FP16.safetensors"
    }
  },
  "5": {
@@ -38,7 +38,7 @@
  "7": {
    "class_type": "CLIPTextEncode",
    "inputs": {
-      "text": "",
+      "text": "lowres, blurry, jpeg artifacts, watermark, text, signature, bad anatomy, extra limbs, missing fingers, deformed, ugly, low quality, worst quality",
      "clip": ["4", 1]
    }
  },
Author	SHA1	Message	Date
William Gill	d935e24624	Add text-targeted inpainting via GroundingDINO+SAM (mask_text param) All checks were successful release / Build & Push Docker Image (push) Successful in 44s Details Five pieces: 1. Dockerfile installs storyicon/comfyui_segment_anything (GroundingDINO + SAM-HQ in one bundle) into custom_nodes and pip-installs its requirements at build time. Model weights auto-download to the comfyui-models volume on first inpaint (~3 GB one-time cost). 2. install-custom-node-deps.sh — entrypoint wrapper that pip-installs requirements.txt for any custom_node present at startup. Lets users add custom nodes via ComfyUI-Manager (or by git-cloning into the volume) and have the deps picked up on the next restart, without editing the Dockerfile. 3. smart_image_gen v0.6: edit_image gains a `mask_text` param. When set, builds an inpainting workflow (LoadImage → GroundingDinoSAM Segment → SetLatentNoiseMask → KSampler) so only the named region is repainted. When unset, falls through to the existing img2img path. Denoise default switches: 1.0 with mask_text (full repaint within mask), 0.7 without. 4. Image Studio system prompt teaches the LLM the LOCAL vs GLOBAL distinction — set mask_text whenever the user names a specific object/region ('the ball', 'the dog', 'the sky'); leave it unset only for whole-image style/lighting transformations. 5. Deployment README documents the new mode + the first-inpaint weight-download caveat. Image rebuild required — bump tag to pick up the Dockerfile change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:43:52 -05:00
William Gill	7c7897818e	Image Studio: bake tool_choice=required into the preset Without it, abliterated/reasoning models like huihui_ai/qwen3.5- abliterated:9b reliably choose to write a planning response instead of calling the tool — even with /no_think and a terse imperative system prompt. tool_choice=required is passed through to Ollama's chat API and removes the model's option to respond in text at all, forcing exactly one tool call per turn. Confirmed working with the abliterated Qwen 3.5 9B base. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:30:49 -05:00
William Gill	a1fca4d5d9	Image Studio: default base model → huihui_ai/qwen3.5-abliterated:9b Re-imports of image_studio.json kept reverting the base model back to mistral-nemo:12b because that was still hard-coded in the JSON. Updated the JSON, the markdown setup table, and the vision-capability section to lead with the Qwen 3.5 abliterated 9B preset. Re-ordered the markdown's vision section: shipped default first (Qwen 3.5 abliterated, with the /no_think + enable_thinking caveat called out explicitly), alternatives (qwen2.5vl:7b, llama3.2-vision, minicpm-v) second, non-vision fallback third. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:22:13 -05:00
William Gill	0b1c2ee5b5	Image Studio: tighten system prompt, add /no_think for Qwen 3.x User reported the model writing a multi-paragraph 'editing plan' instead of calling edit_image, only firing the tool when explicitly told to. Two underlying causes: 1. The previous system prompt was conversational ('ALWAYS / NEVER' lists with discussion) — Qwen-style models read that as topics to think about rather than rules to obey. Replaced with terse, imperative dispatcher framing: 'You do not respond in prose. Every user message MUST result in exactly one tool call.' 2. Qwen 3.x ships with thinking mode on by default. Reasoning models almost universally degrade native function calling — they plan how to use a tool instead of just calling it. Prepended /no_think (Qwen 3.x recognises this token and skips reasoning). No-op for non-Qwen-3 base models. Removed the long after-action paragraph that encouraged elaborate follow-ups; replaced with 'at most one short sentence'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:18:49 -05:00
William Gill	5a34ced8f1	Add S3 mirror path for Ollama models + mirror-ollama-model.sh helper Three pieces: 1. mirror-ollama-model.sh — run on any machine that has the model pulled. Parses the manifest at ~/.ollama/models/manifests/registry.ollama.ai/<ns>/<name>/<tag>, greps every sha256:* digest, tars manifest + referenced blobs into one .tgz. Output is portable — extract over any other Ollama data dir and the model is immediately visible. 2. init-models.sh gains an s3_pull function that curls a tarball from $S3_OLLAMA_BASE and extracts into /root/.ollama/models/. Falls back to ollama pull when S3_OLLAMA_BASE is unset, so s3_pull lines are safe to commit before the bucket is ready. huihui_ai/qwen3.5- abliterated:9b promoted to s3_pull as the example. 3. docker-compose.yml model-init service propagates S3_OLLAMA_BASE from .env. Curl auto-installs at script start because ollama/ollama doesn't always ship it. README documents the mirror workflow under "Mirroring models to S3". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 13:43:26 -05:00
William Gill	f77f5993fb	Image Studio: enable vision capability + document upgrade path Open WebUI was blocking image attachments to the Image Studio model because mistral-nemo:12b isn't vision-capable. Two changes: - capabilities.vision flipped to true in the preset JSON. The Tool only needs the image to make it through __messages__ / __files__ to call edit_image; the actual visual processing happens in ComfyUI's img2img, not in the LLM. Setting the flag unlocks the attach-image UI without lying about what mistral-nemo can do. - System prompt now tells the LLM explicitly: "you may not be able to visually inspect the attached image — that is fine. Trust the user's description and call edit_image." Prevents the LLM from refusing or hedging when it gets an image it can't see. Documented the upgrade path in image_studio.md for users who want real vision (qwen2.5vl:7b, llama3.2-vision:11b, minicpm-v:8b — pick one, add to init-models.sh, swap base_model_id in the preset). The vision LLM can then write smarter edit_image calls from the image content rather than the user's description alone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 13:31:17 -05:00
William Gill	b604e3f509	smart_image_gen v0.5: surface images via files event (canonical path) The data-URI message-event approach didn't render — Open WebUI's chat frontend ignores data URIs from tool-emitted message events because the markdown-base64 rewriter (utils/files.py convert_markdown_base64 _images) only runs on assistant streaming content, not on tool emits. Switched to the path Open WebUI's own image-generation flow uses (backend/open_webui/utils/middleware.py ~1325): 1. Upload image bytes via open_webui.routers.files.upload_file_handler (gets back a file_item with id) 2. Resolve the served URL via request.app.url_path_for( "get_file_content_by_id", id=file_item.id) → /api/v1/files/{id}/content 3. Emit a `files` event: {"type": "files", "data": {"files": [{"type": "image", "url": ...}]}} Tools now take __request__, __user__, __metadata__ params for the upload (Open WebUI auto-injects these). Falls back to data-URI message event if the runtime imports aren't available (e.g. running the file standalone for tests). The internal upload bypasses get_verified_user via the user= kwarg, so no token plumbing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 13:21:48 -05:00
William Gill	4d996e1205	smart_image_gen v0.4: emit image to chat, return only confirmation The data URI returned from the tool was being given to the LLM as the tool result — the LLM then either echoed the base64 to the user as plain text (screenshot 1) or hallucinated a description of what it thought the image looked like (screenshot 2 — "an image of a cat sitting on a windowsill" for a fox-warrior prompt). Fix: push the markdown image into the chat directly via __event_emitter__ as a "message" event, and return a short text confirmation as the function value. The confirmation is worded to prevent the LLM from describing the image or repeating the markdown (both common failure modes for tool-using LLMs). Both generate_image and edit_image fixed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 13:14:59 -05:00
William Gill	6adf133558	Ship Image Studio as importable JSON in addition to markdown walkthrough Open WebUI accepts a JSON file at Workspace → Models → Import that seeds a new model preset in one click instead of the manual table- driven setup. The new image_studio.json mirrors the Open WebUI bulk- export schema (array wrapper around the model object with id, name, base_model_id, params, meta) and pre-fills system prompt, native function calling, temperature 0.5, top_p 0.9, smart_image_gen tool attachment, suggestion prompts. The markdown walkthrough stays as the source of truth for the system prompt content and as the fallback when import fails (e.g. tool ID mismatch, unfamiliar field, schema drift across Open WebUI versions). README points at both paths. Caveat doc'd in the markdown: if the imported preset doesn't actually have smart_image_gen attached, the tool ID in the JSON didn't match what Open WebUI assigned — re-attach manually in the model edit screen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 13:04:49 -05:00
William Gill	d4e2058859	smart_image_gen v0.3: add edit_image (img2img) method The Tool now exposes two methods the LLM picks between based on whether the user attached an image: generate_image — txt2img (existing, unchanged behavior) edit_image — img2img on the most recently attached image edit_image extracts the source image from __messages__ (base64 data URIs in image_url content blocks) or __files__ (local path or URL), uploads to ComfyUI's /upload/image, runs an img2img workflow at the caller-specified denoise (default 0.7), and returns the edited result. Same per-style routing / sampler / CFG / prefix logic as generation. Refactored the submit-and-poll loop into _submit_and_fetch shared by both methods. Image extraction is defensive — tries messages first, then files (path then URL), returns a clear "no image attached" message rather than silently generating from scratch. Image Studio system prompt rewritten to teach the LLM when to call edit_image vs generate_image and how to pick denoise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:59:13 -05:00
William Gill	41d571d8d1	Add Image Studio model preset — forces smart_image_gen tool use A documented Open WebUI custom-model preset wrapping mistral-nemo:12b with: aggressive system prompt that mandates calling generate_image, only the smart_image_gen tool attached, native function calling, lower temperature for tool-call reliability. Users pick "Image Studio" from the chat-model dropdown when they want images. Solves the common case where general-purpose chat models describe an image in text instead of firing the tool — usually on conversational phrasings like "can you draw me…". The preset removes the ambiguity by giving the LLM exactly one job and one tool. Setup walkthrough in openwebui-models/image_studio.md; deployment README §9 points users at it as the recommended path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:54:13 -05:00
William Gill	9e22de0328	smart_image_gen: tighten docstring + Literal style enum Two changes to make the LLM more likely to call the tool: 1. Lead the docstring with an unambiguous directive — "Create an image and show it to the user. Use this whenever the user asks you to draw, generate, ..." plus a hard "do not say you cannot generate images" line. Open WebUI feeds the docstring straight to the LLM as the tool description; first line carries the most weight. 2. `style: Optional[StyleName]` where StyleName is a Literal enum of the seven values. Native function-calling models read the type annotation and present the seven valid values to the LLM as a strict choice instead of a free-text param. If the LLM still doesn't fire the tool, the install is probably wrong: Workspace → Models → the model → Advanced Params → Function Calling must be set to Native (not Default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:52:26 -05:00
William Gill	b815cd6a5f	Tune static workflows to CyberRealisticXL recommended settings The static workflow JSONs default to CyberRealisticXLPlay (set in an earlier commit), but the KSampler still had euler/normal/CFG7/20 — the generic settings I scaffolded with. Updated to the creator-published defaults: dpmpp_2m_sde / karras / CFG 4 / 28 steps. CLIP skip 1 already correct (no node needed; default behavior). Added a section to the deployment README spelling out the trade-off: static workflows are locked to one checkpoint family at a time because Open WebUI's nodes mapping doesn't expose sampler/CFG/scheduler/CLIP skip/prefix. For multi-checkpoint use, the smart_image_gen Tool path is the only one that gets these right per-prompt. Re-paste workflows into Open WebUI Settings → Images to pick up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:47:46 -05:00
William Gill	45d5541be0	smart_image_gen v0.2: per-style sampler/CFG/steps/CLIP-skip + prompt prefixes Researched each of the seven SDXL checkpoints on Civitai and encoded the creator-recommended generation defaults per style instead of one global set. Material differences: - photo (CyberRealistic): dpmpp_2m_sde / karras / CFG 4 / 28 steps / CLIP 1 - juggernaut: dpmpp_2m_sde / karras / CFG 4.5 / 35 steps / CLIP 1 - pony: euler_a / normal / CFG 7.5 / 25 steps / CLIP 2 - general (Talmendo): dpmpp_2m / karras / CFG 8 / 30 steps / CLIP 2 - furry-nai (Reed): euler_a / normal / CFG 5 / 30 steps / CLIP 2 - furry-noob (IndigoVoid): euler_a-only / normal / CFG 4.5 / 20 / CLIP 2 - furry-il (NovaFurry): euler_a / normal / CFG 4 / 30 steps / CLIP 2 Three prompt-prefix dialects auto-prepended (NEVER cross-contaminated): photoreal models get nothing, Pony gets the full score_9..score_4_up chain (mandatory), and the NoobAI/Illustrious furry models get their booru quality + year-tag prefixes (masterpiece/best quality/absurdres/newest/etc). Workflow now includes a CLIPSetLastLayer node so per-style CLIP skip works. Routing default for generic "furry" flipped from Reed (NAI) to NovaFurry (Illustrious) — current sweet-spot consensus. Removed global DEFAULT_STEPS/DEFAULT_CFG valves; per-style values are canonical. Sources: each model's Civitai page (CyberRealisticXL, Juggernaut, Pony V6 XL, TalmendoXL, Reed FurryMix, IndigoVoid FurryFused, NovaFurryXL) and Pony/Illustrious prompting guides. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:45:34 -05:00
William Gill	cd0034cd99	Flesh out per-style negatives in smart_image_gen Tool Each style now gets a proper baseline covering quality, anatomy, and watermark/signature suppression — plus the appropriate style-leak guards (no-cartoon for photo, no-human for furry, score_4–6 suppression for pony). Quality terms only; no NSFW filtering by default since several checkpoints in this set are commonly used for adult work and would fight a baked-in content filter. If SFW-by-default is wanted, add an explicit safe-mode flag rather than expanding NEGATIVES. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:39:24 -05:00
William Gill	c585e53ed4	Bake quality-focused default negatives into the static workflows Open WebUI overwrites node 7's text when the request supplies a negative_prompt, so the default only takes effect when one isn't provided — which is the common case for the image-button path since the chat UI doesn't expose the field. Generic quality terms only (no style or content restrictions) so the default is safe across SD/SDXL/Flux swaps and doesn't fight whichever checkpoint is loaded. The smart_image_gen Tool already had per-style defaults; this only affects the non-Tool image-gen path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:36:44 -05:00
William Gill	392b26167f	Add smart_image_gen Tool for per-prompt checkpoint routing Open WebUI Tool the LLM invokes instead of the built-in image action. Auto-routes among the seven SDXL checkpoints (photo / juggernaut / pony / general / furry-{nai,noob,il}) based on either an explicit `style` arg or first-match-wins regex over the prompt. Constructs the ComfyUI workflow inline, submits via /prompt, polls /history, returns the result as a base64 data-URI markdown image so no extra hosting is needed. Per-style default negatives. ComfyUI URL / steps / CFG / timeout are admin-tunable Valves. Filters can't see image-gen requests in Open WebUI (the routers skip the filter chain), so the LLM-driven Tool is the only path that gives intent-aware routing without changing the chat UX. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:17:02 -05:00
William Gill	704bcfdf13	Default workflows to SDXL CyberRealistic; ship empty model preseed Drops the SD 1.5 placeholder. The shipped txt2img/img2img workflows now reference CyberRealisticXLPlay_V8.0_FP16.safetensors (the checkpoint figment used in production), and comfyui-init-models.sh ships with no active fetches — operators uncomment examples or add their own URLs. The script + workflow filenames have to line up; README explains. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:02:26 -05:00
William Gill	0ad99b6199	Add comfyui-model-init sidecar for ComfyUI model preseeding Mirrors the Ollama model-init pattern: a one-shot Alpine container that mounts the comfyui-models volume and runs comfyui-init-models.sh, which curls direct download URLs (HuggingFace by default) into the right subdirectories. Idempotent — already-present files are skipped. HF_TOKEN is plumbed through for gated repos (Flux-dev, SD3, etc.) and is opt-in via .env. The default list ships SD 1.5 only, matching the placeholder filename in workflows/*.json. Examples for SDXL, Flux, and upscalers are commented in the script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 11:57:24 -05:00