Image tools: migrate to OWUI 0.9.0 async model accessors

Open WebUI 0.9.0 made every model-class accessor (Users.get_user_by_id, Chats.get_chat_by_id, Files.get_file_by_id, …) a coroutine. Both tools were still calling them synchronously, so the calls returned coroutines instead of model objects; the first downstream attribute access threw, the bare `except Exception: return False` swallowed it, and uploads silently fell through to the data-URI fallback. The data-URI markdown rendered during streaming but didn't survive post-stream commit, which looked like "image flashes in, then disappears." Add await to the six call sites; promote `_read_file_dict` to async since it now contains an await; restore `_push_image_to_chat` to the canonical `files` event so the file-attachment chrome (thumbnail + download) comes back. This supersedes commit d034700, which mis-diagnosed the symptom as a virtualization regression and switched to a `message`-event markdown workaround. The workaround didn't help (same flash-and-vanish) because the upload pre-check still failed for the same async-migration reason and the data-URI fallback path still ran. smart_image_gen.py 0.7.9 -> 0.7.10 smart_image_pipe.py 0.1.1 -> 0.1.2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Image tools: work around OWUI 0.9.x files-event regression
2026-04-26 06:16:02 -05:00 · 2026-04-26 06:05:12 -05:00 · 2026-04-20 17:08:32 +00:00 · 2026-04-19 19:25:42 -05:00 · 2026-04-19 18:22:45 -05:00 · 2026-04-19 18:12:24 -05:00
9 changed files with 905 additions and 63 deletions
--- a/deployments/ai-stack/.env.example
+++ b/deployments/ai-stack/.env.example
@@ -19,9 +19,26 @@ WEBUI_SECRET_KEY=replace-with-32-byte-hex
 # Only needed if you uncomment the anubis-owui service in docker-compose.yml.
 ANUBIS_OWUI_KEY=replace-with-32-byte-hex

-# ComfyUI image tag to deploy. `latest` tracks whatever the release workflow
-# last pushed; pin to a v* tag (e.g. 0.1.0) for reproducible deploys.
-COMFYUI_IMAGE_TAG=latest
+# ─── Image tags ─────────────────────────────────────────────────────────────
+# Pin to specific versions for reproducible deploys. The defaults below are
+# the last set verified to work end-to-end for this stack — change only when
+# you've tested a newer combination. `latest` / `main` is fine for local
+# experimentation but means deploys are non-deterministic.
+#
+# Find current tags at:
+#   ComfyUI         git.anomalous.dev/alphacentri/comfyui-nvidia/-/tags
+#   Caddy           https://hub.docker.com/_/caddy/tags
+#   Ollama          https://hub.docker.com/r/ollama/ollama/tags
+#   Open WebUI      https://github.com/open-webui/open-webui/pkgs/container/open-webui
+#   Alpine          https://hub.docker.com/_/alpine/tags
+#   Anubis          https://github.com/TecharoHQ/anubis/pkgs/container/anubis
+
+COMFYUI_IMAGE_TAG=0.2.1
+CADDY_TAG=2-alpine
+OLLAMA_TAG=latest
+OPEN_WEBUI_TAG=main
+ALPINE_TAG=3.20
+ANUBIS_TAG=latest

 # HuggingFace access token. Only needed if comfyui-init-models.sh references
 # gated repos (Flux-dev, SD3, etc.). Generate a read token at
--- a/deployments/ai-stack/README.md
+++ b/deployments/ai-stack/README.md
@@ -218,6 +218,15 @@ Setup — two paths:
 Users then pick **Image Studio** from the chat-model dropdown when
 they want to generate or edit images.

+**One required follow-up** after either install path: set a separate
+**Task Model** in Admin Settings → Interface → Task Model. Image
+Studio uses `tool_choice: required` to force tool calls, which means
+the same model can't produce the text responses Open WebUI needs for
+chat-title generation, tag suggestions, and autocomplete. Pick any
+non-Image-Studio model you have pulled (`mistral-nemo:12b`,
+`llama3.1:8b`, etc.) — see the
+[**Set a separate Task Model** section in image_studio.md](openwebui-models/image_studio.md#set-a-separate-task-model-required-after-install).
+
 The preset ships with `vision: true` so users can attach images for
 editing even though `mistral-nemo:12b` isn't a vision model — see the
 [**Vision capability** section in image_studio.md](openwebui-models/image_studio.md#vision-capability)
--- a/deployments/ai-stack/comfyui-init-models.sh
+++ b/deployments/ai-stack/comfyui-init-models.sh
@@ -14,7 +14,8 @@ set -e
 apk add --no-cache curl >/dev/null

 mkdir -p /models/checkpoints /models/vae /models/loras /models/controlnet \
-         /models/clip /models/clip_vision /models/upscale_models /models/embeddings
+         /models/clip /models/clip_vision /models/upscale_models /models/embeddings \
+         /models/sams /models/grounding-dino

 fetch() {
    dest="$1"; name="$2"; url="$3"
@@ -65,4 +66,19 @@ fetch() {
 # fetch upscale_models 4x-UltraSharp.pth \
 #     https://huggingface.co/lokCX/4x-Ultrasharp/resolve/main/4x-UltraSharp.pth

+# ─── Inpainting models (SAM-HQ + GroundingDINO) ─────────────────────────────
+# Required by the smart_image_gen Tool's edit_image with mask_text. ComfyUI
+# would auto-download these on first use, but that takes minutes and tends
+# to time out in-flight tool calls — preseeding here makes the first inpaint
+# instant.
+
+fetch sams sam_hq_vit_h.pth \
+    https://huggingface.co/lkeab/hq-sam/resolve/main/sam_hq_vit_h.pth
+
+fetch grounding-dino groundingdino_swint_ogc.pth \
+    https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/groundingdino_swint_ogc.pth
+
+fetch grounding-dino GroundingDINO_SwinT_OGC.cfg.py \
+    https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/GroundingDINO_SwinT_OGC.cfg.py
+
 echo "Done."
--- a/deployments/ai-stack/docker-compose.yml
+++ b/deployments/ai-stack/docker-compose.yml
@@ -24,7 +24,7 @@ services:
  # Encrypt), reverse-proxies to the in-compose services by name.
  # ---------------------------------------------------------------------------
  caddy:
-    image: caddy:2-alpine
+    image: caddy:${CADDY_TAG:-2-alpine}
    container_name: caddy
    restart: unless-stopped
    ports:
@@ -49,7 +49,7 @@ services:
  # Ollama — LLM daemon, GPU-backed.
  # ---------------------------------------------------------------------------
  ollama:
-    image: ollama/ollama:latest
+    image: ollama/ollama:${OLLAMA_TAG:-latest}
    container_name: ollama
    restart: unless-stopped
    # 11434 only published if you want direct access from the VM host.
@@ -60,8 +60,12 @@ services:
      - ollama-data:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0:11434
-      - OLLAMA_KEEP_ALIVE=30m
-      - OLLAMA_MAX_LOADED_MODELS=2
+      # KEEP_ALIVE=-1 holds loaded models in VRAM until evicted by another
+      # load (vs the default 5m / our previous 30m which forces a reload
+      # penalty on every cold use). Pair with MAX_LOADED_MODELS sized to
+      # whatever fits in your GPU's VRAM — see README "VRAM sizing".
+      - OLLAMA_KEEP_ALIVE=-1
+      - OLLAMA_MAX_LOADED_MODELS=3
      - OLLAMA_FLASH_ATTENTION=1
    deploy:
      resources:
@@ -84,7 +88,7 @@ services:
  # mirror (set S3_OLLAMA_BASE in .env; create tarballs with
  # mirror-ollama-model.sh).
  model-init:
-    image: ollama/ollama:latest
+    image: ollama/ollama:${OLLAMA_TAG:-latest}
    container_name: ollama-model-init
    depends_on:
      ollama:
@@ -108,7 +112,7 @@ services:
  # (install via ComfyUI-Manager) instead of as a separate sidecar.
  # ---------------------------------------------------------------------------
  comfyui:
-    image: git.anomalous.dev/alphacentri/comfyui-nvidia:${COMFYUI_IMAGE_TAG:-latest}
+    image: git.anomalous.dev/alphacentri/comfyui-nvidia:${COMFYUI_IMAGE_TAG:-0.2.1}
    pull_policy: always
    container_name: comfyui
    restart: unless-stopped
@@ -139,7 +143,7 @@ services:
  # need to be running for this — files just land on the volume; ComfyUI
  # picks them up next time it scans (or on a restart).
  comfyui-model-init:
-    image: alpine:latest
+    image: alpine:${ALPINE_TAG:-3.20}
    container_name: comfyui-model-init
    volumes:
      - comfyui-models:/models
@@ -155,7 +159,7 @@ services:
  # Open WebUI — multi-user chat.
  # ---------------------------------------------------------------------------
  open-webui:
-    image: ghcr.io/open-webui/open-webui:main
+    image: ghcr.io/open-webui/open-webui:${OPEN_WEBUI_TAG:-main}
    container_name: open-webui
    restart: unless-stopped
    # ports: not published; Caddy fronts it
@@ -197,7 +201,7 @@ services:
  # `open-webui:8080` → `anubis-owui:8923`.
  # ---------------------------------------------------------------------------
  anubis-owui:
-    image: ghcr.io/techarohq/anubis:latest
+    image: ghcr.io/techarohq/anubis:${ANUBIS_TAG:-latest}
    container_name: anubis-owui
    restart: unless-stopped
    environment:
--- a/deployments/ai-stack/init-models.sh
+++ b/deployments/ai-stack/init-models.sh
@@ -55,6 +55,13 @@ pull() {
 # mirror-ollama-model.sh, upload to S3, then list them here.
 s3_pull "huihui_ai/qwen3.5-abliterated:9b" "qwen3.5-abliterated-9b.tgz"

+# huihui_ai/qwen3-vl-abliterated — Qwen 3 VL base abliteration (different
+# fine-tune lineage than Qwen 3.5, so its tool-call template stays intact).
+# Used as the Image Studio dispatcher: vision-capable, calls tools
+# reliably, and doesn't refuse to dispatch on NSFW edit prompts. Pulled
+# from registry; no S3 mirror entry yet.
+pull "huihui_ai/qwen3-vl-abliterated:8b"
+
 # ─── Direct registry pulls ──────────────────────────────────────────────────
 for model in dolphin3:8b llama3.1:8b ministral-3:8b mistral-nemo:12b qwen3.6:latest; do
    pull "$model"
--- a/deployments/ai-stack/openwebui-models/image_studio.json
+++ b/deployments/ai-stack/openwebui-models/image_studio.json
@@ -1,15 +1,16 @@
 [
  {
    "id": "image-studio",
-    "base_model_id": "huihui_ai/qwen3.5-abliterated:9b",
+    "base_model_id": "huihui_ai/qwen3-vl-abliterated:8b",
    "name": "Image Studio",
    "params": {
-      "system": "/no_think\n\nYou are an image-tool dispatcher. You do not respond in prose. Every user message MUST result in exactly one tool call.\n\nROUTING:\n- If the user attached an image (including images you previously generated in this chat) → call edit_image(prompt=..., ...)\n- Otherwise → call generate_image(prompt=..., ...)\nBoth tools take `prompt` as the first argument — same name on both. Do NOT invent `edit_instruction`.\n\nFire the tool on the FIRST message, with no preamble. Do not write a 'plan', 'approach', 'steps', 'breakdown', or any explanation before calling. Do not ask clarifying questions. Do not say what you are about to do. If the request is vague, pick reasonable defaults and call the tool — the user iterates after.\n\nSTYLES (pick one):\n  photo         photorealistic photo / portrait / cinematic\n  juggernaut    alternate photoreal — sharper, more saturated\n  pony          anime, cartoon, manga, stylised illustration\n  general       catch-all when nothing else fits\n  furry-nai     anthropomorphic, NAI-trained mix\n  furry-noob    anthropomorphic, NoobAI base\n  furry-il      anthropomorphic, Illustrious base (default for any furry/anthro request)\n\nedit_image has TWO MODES — pick based on whether the change is local or global:\n- LOCAL change (\"change the ball to a basketball\", \"add a hat to the dog\", \"remove the bird\", \"recolor the car red\") → set `mask_text` to a brief noun phrase naming the region (\"the ball\", \"the dog\", \"the bird\", \"the car\"). Only that region is repainted; rest stays pixel-perfect.\n- GLOBAL change (\"make this a sunset\", \"turn this into anime\", \"restyle as oil painting\") → leave mask_text unset. The whole image is reimagined.\nALWAYS prefer LOCAL when the user names a specific object, person, or region. GLOBAL is only for whole-image style/lighting transformations.\n\nDenoise:\n- LOCAL (mask_text set): default 1.0. Drop to 0.6–0.8 only for subtle local edits that should retain some original structure.\n- GLOBAL (no mask_text): default 0.7. Use 0.3–0.5 for subtle restyle, 0.85–1.0 for radical reimagining.\n\nPick style for the DESIRED OUTPUT, not the input image.\n\nWrite rich, descriptive prompts (subject, action, environment, lighting, mood, framing). Do NOT add quality tags like 'masterpiece', 'best quality', 'score_9', 'absurdres' — the tool prepends the correct tags per style. Do NOT set sampler, CFG, steps, scheduler — the tool picks them.\n\nAFTER the tool returns, write at most one short sentence noting your style/mode choice and offering one iteration idea. The image is already shown to the user; do not describe it.",
+      "system": "/no_think\n\nYou are an image-tool dispatcher. You do not respond in prose. Every user message MUST result in exactly one tool call.\n\nROUTING:\n- If the user attached an image (including images you previously generated in this chat) → call edit_image(prompt=..., ...)\n- Otherwise → call generate_image(prompt=..., ...)\nBoth tools take `prompt` as the first argument — same name on both. Do NOT invent `edit_instruction`.\n\nFire the tool on the FIRST message, with no preamble. Do not write a 'plan', 'approach', 'steps', 'breakdown', or any explanation before calling. Do not ask clarifying questions. Do not say what you are about to do. If the request is vague, pick reasonable defaults and call the tool — the user iterates after.\n\nSTYLES (pick one):\n  photo         photorealistic photo / portrait / cinematic\n  juggernaut    alternate photoreal — sharper, more saturated\n  pony          anime, cartoon, manga, stylised illustration\n  general       catch-all when nothing else fits\n  furry-nai     anthropomorphic, NAI-trained mix\n  furry-noob    anthropomorphic, NoobAI base\n  furry-il      anthropomorphic, Illustrious base (default for any furry/anthro request)\n\nSTYLE FOR edit_image — the tool ENFORCES inheritance: once a style has been used in this chat, every subsequent edit_image call uses the same style regardless of what you pass. Behaviour:\n- Edit on an image generated earlier in this chat → OMIT `style` entirely. The tool will use the established style. Passing it is harmless but ignored.\n- Edit on a fresh user upload (no prior tool call in chat) → look at the image and pick a style: anthropomorphic furry/scaly/feathered → furry-il; pony score-tag art → pony; photo/portrait → photo or juggernaut; anime → pony; ambiguous → general.\n- Style cannot be changed mid-chat. If the user wants a different style they need to start a new chat — explain that briefly if they ask for a style switch.\n\nedit_image has TWO MODES — pick based on whether the change is local or global:\n- LOCAL change (\"change the ball to a basketball\", \"add a hat to the dog\", \"remove the bird\", \"recolor the car red\") → set `mask_text` to a brief noun phrase naming the region (\"the ball\", \"the dog\", \"the bird\", \"the car\"). Only that region is repainted; rest stays pixel-perfect.\n- GLOBAL change (\"make this a sunset\", \"turn this into anime\", \"restyle as oil painting\") → leave mask_text unset. The whole image is reimagined.\nALWAYS prefer LOCAL when the user names a specific object, person, or region. GLOBAL is only for whole-image style/lighting transformations.\n\nDenoise:\n- LOCAL (mask_text set): default 1.0. Drop to 0.6–0.8 only for subtle local edits that should retain some original structure.\n- GLOBAL (no mask_text): default 0.7. Use 0.3–0.5 for subtle restyle, 0.85–1.0 for radical reimagining.\n\nPick style for the DESIRED OUTPUT, not the input image.\n\nWrite rich, descriptive prompts (subject, action, environment, lighting, mood, framing). Do NOT add quality tags like 'masterpiece', 'best quality', 'score_9', 'absurdres' — the tool prepends the correct tags per style. Do NOT set sampler, CFG, steps, scheduler — the tool picks them.\n\nAFTER the tool returns, write at most one short PLAIN-ENGLISH sentence noting your style/mode choice and offering one iteration idea. The image is already shown to the user.\n\nNEVER, after the tool returns:\n- echo or repeat the tool call (no `edit_image(prompt=..., ...)`, no `<function=...>`, no JSON, no parameter listings)\n- describe what's in the image\n- list the arguments you used\n- enumerate styles, denoise, mask_text, etc.\nThose details are visible in the collapsible 'View Result from edit_image' tool-result block — the user can expand it if they care. Your follow-up message is for HUMAN conversation, not bookkeeping.",
      "temperature": 0.5,
      "top_p": 0.9,
      "function_calling": "native",
      "custom_params": {
-        "tool_choice": "required"
+        "tool_choice": "required",
+        "enable_thinking": false
      }
    },
    "meta": {
--- a/deployments/ai-stack/openwebui-models/image_studio.md
+++ b/deployments/ai-stack/openwebui-models/image_studio.md
@@ -38,7 +38,7 @@ prompts. Verify after import:
 | Field | Value |
 | ----- | ----- |
 | Name | `Image Studio` |
-| Base Model | `huihui_ai/qwen3.5-abliterated:9b` (vision-capable, 256K context, abliterated). Pull via `init-models.sh` first. |
+| Base Model | `huihui_ai/qwen3-vl-abliterated:8b` (Qwen 3 VL base, abliterated, vision + tools). Pull via `init-models.sh` first. The Qwen 3 VL fine-tune lineage isn't damaged by abliteration the way Qwen 3.5 is, so it both calls tools reliably AND won't refuse to dispatch on NSFW edit prompts. |
 | Description | `Image generation and routing across SDXL checkpoints.` |
 | System Prompt | Paste the block from [System prompt](#system-prompt) below. |
 | Tools | enable **only** `smart_image_gen` |
@@ -47,11 +47,11 @@ In the **Advanced Params** section:

 | Field | Value |
 | ----- | ----- |
-| Function Calling | `Native` (mandatory) |
+| Function Calling | `Native` — works cleanly on `huihui_ai/qwen3-vl-abliterated:8b` once thinking is disabled (see Custom Parameters). Native gives you the structured "View Result from edit_image" blocks and "Thought for X seconds" tracing in the UI. |
 | Temperature | `0.5` (lower = more reliable tool-calling) |
 | Top P | `0.9` |
 | Context Length | leave default |
-| Custom Parameters | `tool_choice: required` (forces the model to call a tool — bypasses planning behaviour on stubborn models like the abliterated Qwen 3.5) |
+| Custom Parameters | `tool_choice: required` (forces the model to call a tool every turn) **and** `enable_thinking: false` (disables Qwen's thinking mode at the API level — the `/no_think` system-prompt directive isn't honored by abliterated Qwen builds, but this server-side flag is). Both required for reliable behaviour on `huihui_ai/qwen3-vl-abliterated:8b`. |

 Save. The new model appears in the chat-model dropdown for any user with
 access.
@@ -87,6 +87,21 @@ STYLES (pick one):
  furry-il      anthropomorphic, Illustrious base (default for any
                furry/anthro request)

+STYLE FOR edit_image — the tool ENFORCES inheritance: once a style
+has been used in this chat, every subsequent edit_image call uses
+the same style regardless of what you pass. Behaviour:
+
+- Edit on an image generated earlier in this chat → OMIT `style`
+  entirely. The tool will use the established style. Passing it is
+  harmless but ignored.
+- Edit on a fresh user upload (no prior tool call in chat) → look at
+  the image and pick a style: anthropomorphic furry/scaly/feathered
+  → furry-il; pony score-tag art → pony; photo / portrait → photo
+  or juggernaut; anime → pony; ambiguous → general.
+- Style cannot be changed mid-chat. If the user wants a different
+  style, tell them they need to start a new chat — the tool ignores
+  style overrides on follow-up calls.
+
 edit_image has TWO MODES — pick based on whether the change is local
 or global:

@@ -117,34 +132,80 @@ lighting, mood, framing). Do NOT add quality tags like 'masterpiece',
 correct tags per style. Do NOT set sampler, CFG, steps, scheduler —
 the tool picks them.

-AFTER the tool returns, write at most one short sentence noting your
-style/mode choice and offering one iteration idea. The image is
-already shown to the user; do not describe it.
+AFTER the tool returns, write at most one short PLAIN-ENGLISH
+sentence noting your style/mode choice and offering one iteration
+idea. The image is already shown to the user.
+
+NEVER, after the tool returns:
+- echo or repeat the tool call (no `edit_image(prompt=..., ...)`,
+  no `<function=...>`, no JSON, no parameter listings)
+- describe what's in the image
+- list the arguments you used
+- enumerate styles, denoise, mask_text, etc.
+Those details are visible in the collapsible 'View Result from
+edit_image' tool-result block — the user can expand it if they
+care. Your follow-up message is for HUMAN conversation, not
+bookkeeping.
 ```

 The first line `/no_think` disables Qwen 3.x's reasoning phase. If
 your base model isn't Qwen 3, leaving it in is a no-op (other models
 ignore it). Drop it only if it actually causes problems.

+## Set a separate Task Model (required after install)
+
+`tool_choice: required` is what makes Image Studio reliably fire the
+tool, but it has a side effect: Open WebUI uses the same model with
+the same params for **title generation**, **tag generation**, and
+**autocomplete**. With every response forced to be a tool call, those
+text-only background tasks can't produce text, so chats stay named
+"New Chat" forever and tag suggestions go silent.
+
+Fix: point Open WebUI at a different model for those tasks.
+
+**Admin Settings → Interface → Task Model** → pick any of the
+non-Image-Studio models you have pulled. `mistral-nemo:12b`,
+`llama3.1:8b`, `qwen3.6:latest`, or `dolphin3:8b` all work. The Task
+Model only handles short background calls (titles, tags, autocomplete,
+search-query rewriting) — it doesn't need to be vision-capable or
+particularly large. Smaller is faster and cheaper.
+
+Save. New Image Studio chats now get descriptive titles, tag
+suggestions return, and autocomplete lights up.
+
 ## Vision capability

 The shipped preset sets `meta.capabilities.vision: true` so Open WebUI
 allows users to attach images to chats with this model. Two paths:

-### Default — `huihui_ai/qwen3.5-abliterated:9b`
+### Default — `huihui_ai/qwen3-vl-abliterated:8b`

-The shipped preset uses Qwen 3.5 abliterated 9B as the base — vision-
-capable, 256K context, no censorship hedging. Preseed via
-`init-models.sh` (an `s3_pull` line is already in place; see
-[Mirroring models to S3](../README.md#mirroring-models-to-s3) for the
-mirror workflow).
+The shipped preset uses huihui_ai's abliteration of Qwen 3 VL as
+the base — 8B params, vision-capable, native tool calling working,
+and won't refuse to dispatch the tool when the user's edit prompt
+is NSFW. Preseed via `init-models.sh`.

-**Important Qwen 3.x quirk:** thinking mode is on by default and
-breaks native function calling — the model "thinks" about how to use
-the tool instead of just calling it. The shipped system prompt starts
-with `/no_think` to suppress this. If the model still plans instead
-of firing the tool, also set `enable_thinking: false` in **Advanced
-Params → Custom Parameters** (API-level enforcement).
+**Why not the Qwen 3.5 abliterated 9B (huihui_ai/qwen3.5-abliterated:9b)?**
+Same maintainer, but the abliteration on Qwen 3.5 mangles the
+function-call template, causing the model to either refuse to call
+tools or emit malformed `<function=...>` XML that Open WebUI's
+parser can't recognise. The Qwen 3 VL fine-tune lineage is
+different and doesn't take that damage from abliteration.
+
+**Why not standard `qwen3.5:9b`?** The standard (non-abliterated)
+Qwen 3.5 calls tools reliably but its safety training refuses on
+many image edit prompts even though the LLM's only job is dispatch
+(the actual image content is generated by the SDXL checkpoint, which
+the LLM never sees). Abliterated VL gets us both reliable tool
+calling AND a cooperative dispatcher.
+
+**Qwen 3.x quirk:** thinking mode is on by default and abliterated
+builds ignore the system-prompt `/no_think` directive — the model
+emits its tool call inside a thinking block that the parser treats
+as final response text instead of a real tool invocation. The
+shipped preset sets `enable_thinking: false` in `custom_params`,
+which Ollama enforces server-side and the model can't ignore. Don't
+remove it.

 ### Alternatives

@@ -173,9 +234,14 @@ the background to a sunset") that doesn't matter.
 - **The system prompt is unambiguous.** No room for the model to
  decide "I'll just describe it in text instead."
 - **Only one tool is attached.** No competing tools to choose between.
- **Native function calling is mandatory.** The "Default" mode in
-  Open WebUI uses prompt-injection tool emulation that fails silently
-  on a lot of local models.
+- **Function Calling: Default** is the safer choice for Qwen 3.x
+  abliterated. Native mode expects the parser to recognise the
+  model's structured tool-call format, which currently leaks Qwen
+  3.5's `<function=...><parameter=...>` XML to chat as plain text on
+  the published Open WebUI / Ollama versions. Default mode uses Open
+  WebUI's own prompt-injection wrapper that round-trips reliably.
+  Try Native only after swapping the base model to one known to work
+  end-to-end (mistral-nemo, qwen2.5vl).
 - **Lower temperature.** Tool calling is more reliable with less
  sampling randomness.

--- a/deployments/ai-stack/openwebui-tools/smart_image_gen.py
+++ b/deployments/ai-stack/openwebui-tools/smart_image_gen.py
@@ -1,7 +1,7 @@
 """
 title: Smart Image Generator & Editor (ComfyUI)
 author: ai-stack
-version: 0.7.2
+version: 0.7.10
 description: Generate or edit images via ComfyUI with automatic SDXL
    checkpoint routing. Two methods — generate_image (txt2img) and
    edit_image (img2img on the user's most recently attached image). The
@@ -20,6 +20,7 @@ import asyncio
 import base64
 import inspect
 import io
+import json
 import re
 import time
 import uuid
@@ -137,15 +138,15 @@ STYLES = {
        "clip_skip": 2,
        "prefix": (
            "masterpiece, best quality, high quality, good quality, "
-            "detailed eyes, highres, absurdres, furry, "
+            "detailed eyes, highres, absurdres, incredibly absurdres, "
        ),
        "negative": (
-            "human, realistic, photorealistic, 3d, cgi, "
-            "worst quality, bad_quality, normal quality, lowres, "
-            "anatomical nonsense, bad anatomy, interlocked fingers, extra fingers, "
-            "bad_feet, bad_hands, deformed anatomy, bad proportions, "
-            "censored, simple background, transparent, face backlighting, "
-            "watermark, signature, text, logo, username, jpeg artifacts"
+            "worst quality, bad_quality, normal quality, lowres, anatomical nonsense, "
+            "bad anatomy, anatomical nonsense, interlocked fingers, extra fingers, "
+            "watermark, simple background, transparent, bad_feet, bad_hands, "
+            "logo, text, bad_anatomy, signature, face backlighting, "
+            "(worst quality, bad quality:1.2), jpeg artifacts, censored, "
+            "extra digit, ugly, deformed anatomy, bad proportions, "
        ),
    },
    "furry-noob": {
@@ -222,10 +223,52 @@ def _route_style(prompt: str) -> str:
    return DEFAULT_STYLE


+def _inherited_style(messages: Optional[list]) -> Optional[str]:
+    """
+    Return the `style` arg from the most recent generate_image /
+    edit_image tool call in the conversation. Used so edit_image can
+    auto-inherit the style of the image being edited when the LLM
+    didn't pass one explicitly — without this, an edit on a furry
+    image with a neutral edit prompt ("make the eyes glow") falls
+    through to the keyword router and picks a wrong style.
+    """
+    if not messages:
+        return None
+    for msg in reversed(messages):
+        if not isinstance(msg, dict):
+            continue
+        for tc in (msg.get("tool_calls") or []):
+            if not isinstance(tc, dict):
+                continue
+            fn = tc.get("function") or {}
+            if fn.get("name") not in ("generate_image", "edit_image"):
+                continue
+            raw_args = fn.get("arguments")
+            if isinstance(raw_args, str):
+                try:
+                    args = json.loads(raw_args)
+                except (TypeError, ValueError):
+                    args = {}
+            elif isinstance(raw_args, dict):
+                args = raw_args
+            else:
+                args = {}
+            style = args.get("style")
+            if isinstance(style, str) and style in STYLES:
+                return style
+    return None
+
+
 def _seed_value(seed: int) -> int:
    return seed if seed > 0 else int(time.time() * 1000) % (2**31)


+def _job_prefix(kind: str) -> str:
+    """Per-submission filename_prefix so SaveImage outputs from concurrent
+    jobs can never share an auto-numbered counter and cross over."""
+    return f"{kind}_{uuid.uuid4().hex[:10]}"
+
+
 def _build_txt2img(positive: str, negative: str, settings: dict,
                   width: int, height: int, seed: int) -> dict:
    """
@@ -249,7 +292,7 @@ def _build_txt2img(positive: str, negative: str, settings: dict,
        "7": {"class_type": "CLIPTextEncode", "inputs": {"text": negative, "clip": ["10", 0]}},
        "8": {"class_type": "VAEDecode", "inputs": {"samples": ["3", 0], "vae": ["4", 2]}},
        "9": {"class_type": "SaveImage",
-              "inputs": {"filename_prefix": "smartgen", "images": ["8", 0]}},
+              "inputs": {"filename_prefix": _job_prefix("smartgen"), "images": ["8", 0]}},
        "10": {"class_type": "CLIPSetLastLayer",
               "inputs": {"stop_at_clip_layer": -settings["clip_skip"],
                          "clip": ["4", 1]}},
@@ -266,6 +309,13 @@ def _build_inpaint(positive: str, negative: str, settings: dict,
    "the dog's collar"), then SetLatentNoiseMask + KSampler repaint
    only that region. Everything outside the mask stays pixel-perfect.

+    The raw SAM mask is run through GrowMask with tapered_corners
+    before it reaches the sampler. Without that, the mask edge is
+    pixel-binary and KSampler repaints right up to a hard boundary —
+    SDXL has no surrounding-pixel context inside the mask to blend
+    with, so the inpainted region looks pasted-on with visible seams.
+    expand=12px + taper gives a soft transition that blends naturally.
+
    First inpaint downloads ~3 GB of SAM/GroundingDINO weights into
    /opt/comfyui/models/{sams,grounding-dino}/ — subsequent runs reuse
    them.
@@ -285,14 +335,14 @@ def _build_inpaint(positive: str, negative: str, settings: dict,
        "7": {"class_type": "CLIPTextEncode", "inputs": {"text": negative, "clip": ["10", 0]}},
        "8": {"class_type": "VAEDecode", "inputs": {"samples": ["3", 0], "vae": ["4", 2]}},
        "9": {"class_type": "SaveImage",
-              "inputs": {"filename_prefix": "smartinpaint", "images": ["8", 0]}},
+              "inputs": {"filename_prefix": _job_prefix("smartinpaint"), "images": ["8", 0]}},
        "10": {"class_type": "CLIPSetLastLayer",
               "inputs": {"stop_at_clip_layer": -settings["clip_skip"],
                          "clip": ["4", 1]}},
        "11": {"class_type": "VAEEncode", "inputs": {"pixels": ["12", 0], "vae": ["4", 2]}},
        "12": {"class_type": "LoadImage", "inputs": {"image": image_filename}},
        "13": {"class_type": "SetLatentNoiseMask",
-               "inputs": {"samples": ["11", 0], "mask": ["16", 1]}},
+               "inputs": {"samples": ["11", 0], "mask": ["17", 0]}},
        "14": {"class_type": "SAMModelLoader (segment anything)",
               "inputs": {"model_name": "sam_hq_vit_h (2.57GB)"}},
        "15": {"class_type": "GroundingDinoModelLoader (segment anything)",
@@ -305,6 +355,12 @@ def _build_inpaint(positive: str, negative: str, settings: dict,
                   "prompt": mask_text,
                   "threshold": 0.3,
               }},
+        "17": {"class_type": "GrowMask",
+               "inputs": {
+                   "mask": ["16", 1],
+                   "expand": 12,
+                   "tapered_corners": True,
+               }},
    }


@@ -331,7 +387,7 @@ def _build_img2img(positive: str, negative: str, settings: dict,
        "7": {"class_type": "CLIPTextEncode", "inputs": {"text": negative, "clip": ["10", 0]}},
        "8": {"class_type": "VAEDecode", "inputs": {"samples": ["3", 0], "vae": ["4", 2]}},
        "9": {"class_type": "SaveImage",
-              "inputs": {"filename_prefix": "smartedit", "images": ["8", 0]}},
+              "inputs": {"filename_prefix": _job_prefix("smartedit"), "images": ["8", 0]}},
        "10": {"class_type": "CLIPSetLastLayer",
               "inputs": {"stop_at_clip_layer": -settings["clip_skip"],
                          "clip": ["4", 1]}},
@@ -349,7 +405,7 @@ def _file_dict_is_image(f: dict) -> bool:
 _FILE_URL_ID_RE = re.compile(r"/(?:api/v1/)?files/([0-9a-fA-F-]{8,})(?:/content)?")


-def _read_file_dict(f: dict) -> Optional[bytes]:
+async def _read_file_dict(f: dict) -> Optional[bytes]:
    """
    Try to read raw bytes for one file dict. Tries in order:
      1. Local filesystem path keys (covers user uploads with `path`).
@@ -359,6 +415,12 @@ def _read_file_dict(f: dict) -> Optional[bytes]:
         assistant-emitted files where the message attachment is just
         {"type":"image","url":"/api/v1/files/<uuid>/content"} —
         no id field, no path field, but the URL has the id).
+
+    Async because Open WebUI 0.9.0 made every model-class accessor
+    a coroutine (Users / Chats / Files / etc.). Calling the sync
+    way returns a coroutine object instead of the model — silently
+    breaks downstream attribute access. Same reason the callers in
+    _extract_attached_image and _push_image_to_chat must await.
    """
    for path_key in ("path", "filepath", "file_path"):
        path = f.get(path_key)
@@ -381,7 +443,7 @@ def _read_file_dict(f: dict) -> Optional[bytes]:
    if _OPENWEBUI_RUNTIME:
        for fid in candidate_ids:
            try:
-                file_model = Files.get_file_by_id(fid)
+                file_model = await Files.get_file_by_id(fid)
                if file_model is None:
                    continue
                path = getattr(file_model, "path", None)
@@ -450,7 +512,7 @@ async def _extract_attached_image(
        for f in msg_files:
            if not isinstance(f, dict) or not _file_dict_is_image(f):
                continue
-            data = _read_file_dict(f)
+            data = await _read_file_dict(f)
            if data is not None:
                return data

@@ -458,7 +520,7 @@ async def _extract_attached_image(
    for f in files or []:
        if not isinstance(f, dict) or not _file_dict_is_image(f):
            continue
-        data = _read_file_dict(f)
+        data = await _read_file_dict(f)
        if data is not None:
            return data

@@ -470,7 +532,7 @@ async def _extract_attached_image(
        chat_id = metadata.get("chat_id")
        if chat_id:
            try:
-                chat = Chats.get_chat_by_id(chat_id)
+                chat = await Chats.get_chat_by_id(chat_id)
                chat_data = getattr(chat, "chat", None) if chat else None
                chat_messages = (chat_data or {}).get("messages", []) if isinstance(chat_data, dict) else []
                for msg in reversed(chat_messages):
@@ -480,7 +542,7 @@ async def _extract_attached_image(
                    for f in msg_files:
                        if not isinstance(f, dict) or not _file_dict_is_image(f):
                            continue
-                        data = _read_file_dict(f)
+                        data = await _read_file_dict(f)
                        if data is not None:
                            return data
            except Exception:
@@ -543,7 +605,7 @@ async def _push_image_to_chat(
        return False

    try:
-        user = Users.get_user_by_id(user_dict.get("id"))
+        user = await Users.get_user_by_id(user_dict.get("id"))
        if not user:
            return False

@@ -611,6 +673,14 @@ async def _submit_and_fetch(
        f"CFG {settings['cfg']}, {settings['steps']} steps", False
    )

+    # The SaveImage node in every workflow this tool builds is id "9".
+    # We prefer it explicitly because intermediate nodes (e.g. the
+    # GroundingDinoSAMSegment IMAGE output in the inpaint workflow) can
+    # land in the outputs dict too, and dict iteration order is not
+    # stable across runs — without preferring "9" we sometimes returned
+    # an overlay or masked-only image that rendered mostly black.
+    SAVE_NODE_ID = "9"
+
    deadline = time.time() + timeout_seconds
    output_images: list = []
    while time.time() < deadline:
@@ -620,8 +690,16 @@ async def _submit_and_fetch(
                continue
            history = await resp.json()
        if prompt_id in history:
-            for node_out in history[prompt_id].get("outputs", {}).values():
-                output_images.extend(node_out.get("images", []))
+            outputs = history[prompt_id].get("outputs", {}) or {}
+            # Prefer the canonical SaveImage output …
+            save_imgs = (outputs.get(SAVE_NODE_ID) or {}).get("images", [])
+            if save_imgs:
+                output_images.extend(save_imgs)
+            # … only fall back to other nodes if SaveImage didn't fire
+            # (workflow drift, manual override, etc.)
+            if not output_images:
+                for node_out in outputs.values():
+                    output_images.extend(node_out.get("images", []))
            if output_images:
                break

@@ -647,8 +725,15 @@ class Tools:
            description="ComfyUI server URL reachable from the open-webui container.",
        )
        TIMEOUT_SECONDS: int = Field(
-            default=240,
-            description="Maximum wait for a single generation to complete.",
+            default=600,
+            description=(
+                "Maximum wait for a single generation to complete. "
+                "Default 10 minutes — long enough to absorb a first-time "
+                "inpaint where SAM-HQ + GroundingDINO + BERT auto-download "
+                "(~3 GB). Steady-state runs finish in well under a minute; "
+                "if your KSampler routinely takes longer than that, lower "
+                "the per-style steps in STYLES."
+            ),
        )

    def __init__(self):
@@ -799,9 +884,21 @@ class Tools:

        Pick `style` for the DESIRED OUTPUT, not the input image.

+        Style resolution order: inherited from the most recent prior
+        generate_image / edit_image call in this conversation (DOMINANT)
+        → explicit `style` arg → keyword detection on `prompt`.
+        Inheritance dominates because vision LLMs misclassify subjects
+        in the rendered output (e.g. picking 'juggernaut' on a
+        'furry-il' source). For follow-up edits on an image you
+        generated earlier, omit `style` entirely — the tool reuses the
+        established style automatically. The user can start a new chat
+        if they want a different style.
+
        :param prompt: What the changed area should look like.
            Tool auto-prepends quality tags — don't include those.
-        :param style: One of the StyleName values. Omit to auto-detect.
+        :param style: One of the StyleName values. Omit to auto-inherit
+            from the previous tool call (recommended for edits on
+            images you generated earlier in this chat).
        :param mask_text: Noun phrase describing the region to edit. Set
            for LOCAL changes; omit for GLOBAL.
        :param denoise: 0.0 = no change, 1.0 = ignore source. Defaults to
@@ -810,7 +907,13 @@ class Tools:
        :param seed: 0 to randomize, otherwise specific.
        :return: Markdown image of the result, or an error if no image is attached.
        """
-        chosen = style or _route_style(prompt)
+        # Resolve style — inheritance DOMINATES for edits. Vision LLMs
+        # misclassify subject types (observed in the wild: juggernaut
+        # picked for a furry-il source because the model thought the
+        # rendered character looked "photoreal-ish"). When there's a
+        # prior tool call in this chat, use the same style; the user's
+        # workaround for genuine style changes is a fresh chat.
+        chosen = _inherited_style(__messages__) or style or _route_style(prompt)
        settings = STYLES.get(chosen)
        if not settings:
            return f"Unknown style '{chosen}'. Available: {', '.join(STYLES.keys())}"
@@ -852,7 +955,14 @@ class Tools:
                    "(paperclip / drag-drop), or call generate_image instead."
                )

-            await emit("Uploading source to ComfyUI…")
+            # Diagnostic emit so a misrouted source ("wrong image
+            # returned") shows up in the status track instead of being
+            # invisible. SHA-1 is fast and the first 8 hex chars are
+            # plenty to compare against the prior generation's hash if
+            # cross-talk is suspected.
+            import hashlib  # local import — keeps the module import surface clean
+            src_hash = hashlib.sha1(raw_in).hexdigest()[:8]
+            await emit(f"Uploading source to ComfyUI… (sha1={src_hash}, {len(raw_in)} bytes)")
            uploaded_name = await _upload_to_comfyui(session, base, raw_in)
            if not uploaded_name:
                return "Failed to upload source image to ComfyUI."
--- a/deployments/ai-stack/openwebui-tools/smart_image_pipe.py
+++ b/deployments/ai-stack/openwebui-tools/smart_image_pipe.py
@@ -0,0 +1,612 @@
+"""
+title: Smart Image Studio (Pipe)
+author: ai-stack
+version: 0.1.2
+description: Deterministic image-gen / edit / inpaint pipe — no LLM in the
+    loop for the routing decision. Registers as a model in the chat-model
+    dropdown ('Image Studio (Pipe)'). Reads the user's message + attached
+    image (if any), routes via regex, calls ComfyUI directly, returns the
+    image. Use when LLM-with-Tool tool-calling is leaking the call as text
+    (the abliterated Qwen 3.5 / Open WebUI parser interop bug).
+required_open_webui_version: 0.5.0
+"""
+
+import asyncio
+import base64
+import inspect
+import io
+import json
+import re
+import time
+import uuid
+from typing import Awaitable, Callable, Literal, Optional
+
+import aiohttp
+from pydantic import BaseModel, Field
+
+# Open WebUI runtime imports — same defensive guard as the sibling Tool.
+try:
+    from fastapi import UploadFile
+    from open_webui.models.chats import Chats
+    from open_webui.models.files import Files
+    from open_webui.models.users import Users
+    from open_webui.routers.files import upload_file_handler
+
+    _OPENWEBUI_RUNTIME = True
+except ImportError:
+    _OPENWEBUI_RUNTIME = False
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Per-style settings — kept in sync with smart_image_gen.py. If you change
+# checkpoint filenames in comfyui-init-models.sh, update both files.
+# ─────────────────────────────────────────────────────────────────────────────
+
+STYLES = {
+    "photo": {
+        "ckpt":      "CyberRealisticXLPlay_V8.0_FP16.safetensors",
+        "sampler":   "dpmpp_2m_sde",
+        "scheduler": "karras",
+        "cfg":       4.0, "steps": 28, "clip_skip": 1,
+        "prefix": "",
+        "negative": (
+            "cartoon, drawing, illustration, anime, manga, painting, sketch, "
+            "render, 3d, cgi, plastic skin, oversaturated, "
+            "lowres, blurry, jpeg artifacts, low quality, worst quality, "
+            "bad anatomy, deformed, extra fingers, missing fingers, "
+            "watermark, signature, text, logo"
+        ),
+    },
+    "juggernaut": {
+        "ckpt":      "Juggernaut-XL_v9_RunDiffusionPhoto_v2.safetensors",
+        "sampler":   "dpmpp_2m_sde",
+        "scheduler": "karras",
+        "cfg":       4.5, "steps": 35, "clip_skip": 1,
+        "prefix": "",
+        "negative": (
+            "cartoon, drawing, illustration, anime, painting, sketch, render, "
+            "3d, cgi, plastic skin, washed out, "
+            "lowres, blurry, jpeg artifacts, low quality, worst quality, "
+            "bad anatomy, deformed, extra fingers, missing fingers, "
+            "watermark, signature, text, logo"
+        ),
+    },
+    "pony": {
+        "ckpt":      "ponyDiffusionV6XL_v6StartWithThisOne.safetensors",
+        "sampler":   "euler_ancestral",
+        "scheduler": "normal",
+        "cfg":       7.5, "steps": 25, "clip_skip": 2,
+        "prefix":    "score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up, ",
+        "negative": (
+            "score_6, score_5, score_4, "
+            "worst quality, low quality, lowres, blurry, jpeg artifacts, "
+            "bad anatomy, bad hands, extra digit, fewer digits, "
+            "deformed, ugly, censored, monochrome, "
+            "watermark, signature, text, artist name"
+        ),
+    },
+    "general": {
+        "ckpt":      "talmendoxlSDXL_v11Beta.safetensors",
+        "sampler":   "dpmpp_2m",
+        "scheduler": "karras",
+        "cfg":       8.0, "steps": 30, "clip_skip": 2,
+        "prefix": "",
+        "negative": (
+            "lowres, blurry, jpeg artifacts, low quality, worst quality, "
+            "bad anatomy, deformed, ugly, watermark, signature, text"
+        ),
+    },
+    "furry-nai": {
+        "ckpt":      "reedFURRYMixSDXL_v23nai.safetensors",
+        "sampler":   "euler_ancestral",
+        "scheduler": "normal",
+        "cfg":       5.0, "steps": 30, "clip_skip": 2,
+        "prefix": (
+            "masterpiece, best quality, high quality, detailed eyes, "
+            "highres, absurdres, furry, "
+        ),
+        "negative": (
+            "human, realistic, photorealistic, 3d, cgi, "
+            "worst quality, low quality, lowres, blurry, jpeg artifacts, "
+            "bad anatomy, extra digit, fewer digits, deformed, ugly, "
+            "watermark, signature, text"
+        ),
+    },
+    "furry-noob": {
+        "ckpt":      "indigoVoidFurryFusedXL_noobaiV32.safetensors",
+        "sampler":   "euler_ancestral",
+        "scheduler": "normal",
+        "cfg":       4.5, "steps": 20, "clip_skip": 2,
+        "prefix": (
+            "masterpiece, best quality, perfect quality, absurdres, newest, "
+            "very aesthetic, vibrant colors, "
+        ),
+        "negative": (
+            "human, realistic, photorealistic, 3d, cgi, shiny skin, "
+            "worst quality, low quality, lowres, blurry, jpeg artifacts, "
+            "bad anatomy, bad hands, mutated hands, "
+            "watermark, signature, text"
+        ),
+    },
+    "furry-il": {
+        "ckpt":      "novaFurryXL_ilV170.safetensors",
+        "sampler":   "euler_ancestral",
+        "scheduler": "normal",
+        "cfg":       4.0, "steps": 30, "clip_skip": 2,
+        "prefix": (
+            "masterpiece, best quality, amazing quality, very aesthetic, "
+            "ultra-detailed, absurdres, newest, furry, anthro, "
+        ),
+        "negative": (
+            "human, multiple tails, modern, recent, old, oldest, graphic, "
+            "cartoon, painting, deformed, mutated, ugly, lowres, "
+            "bad anatomy, bad hands, missing fingers, extra digits, "
+            "worst quality, bad quality, sketch, jpeg artifacts, "
+            "signature, watermark, text, simple background"
+        ),
+    },
+}
+
+DEFAULT_STYLE = "furry-il"
+
+ROUTING_RULES = [
+    (re.compile(r"\bscore_\d", re.I),                                       "pony"),
+    (re.compile(r"\bpony\b",   re.I),                                       "pony"),
+    (re.compile(r"\b(noobai|noob)\b", re.I),                                "furry-noob"),
+    (re.compile(r"\b(illustrious|ilxl)\b", re.I),                           "furry-il"),
+    (re.compile(r"\b(furry|anthro|feral|kemono|fursona|species)\b", re.I),  "furry-il"),
+    (re.compile(r"\b(juggernaut)\b", re.I),                                 "juggernaut"),
+    (re.compile(r"\b(photo|photograph|realistic|portrait|selfie|cinematic)\b", re.I), "photo"),
+    (re.compile(r"\b(anime|manga|2d|illustration)\b", re.I),                "pony"),
+]
+
+# Phrases that imply local-only editing → triggers inpaint mode and
+# pulls out a noun phrase as the mask text.
+INPAINT_PATTERNS = [
+    re.compile(r"\b(?:change|recolor|edit|modify|replace|remove|delete|add)\s+(?:the|that|her|his|its)\s+([\w\s'-]{2,30}?)(?:\s+(?:to|into|with|so|that|and|,|\.)|$)", re.I),
+    re.compile(r"\b(?:make|turn)\s+(?:the|that|her|his|its)\s+([\w\s'-]{2,30}?)\s+(?:bigger|smaller|larger|wider|taller|shorter|longer|brighter|darker|red|blue|green|yellow|orange|purple|pink|black|white|gold)", re.I),
+    re.compile(r"\b(?:only|just)\s+(?:the|change the|edit the)\s+([\w\s'-]{2,30}?)(?:\s+|$)", re.I),
+]
+
+
+def _route_style(prompt: str) -> str:
+    for pattern, style in ROUTING_RULES:
+        if pattern.search(prompt):
+            return style
+    return DEFAULT_STYLE
+
+
+def _detect_mask_text(prompt: str) -> Optional[str]:
+    """Pull a noun phrase out of edit-style instructions for inpaint."""
+    for pattern in INPAINT_PATTERNS:
+        m = pattern.search(prompt)
+        if m:
+            obj = m.group(1).strip().rstrip(",.").strip()
+            if obj:
+                return f"the {obj}"
+    return None
+
+
+def _inherited_style(messages) -> Optional[str]:
+    """Best-effort: read prior assistant message metadata for a style hint."""
+    if not messages:
+        return None
+    for msg in reversed(messages):
+        if not isinstance(msg, dict):
+            continue
+        # Look for a "style: X" comment in the assistant's previous text
+        if msg.get("role") == "assistant":
+            content = msg.get("content")
+            if isinstance(content, str):
+                m = re.search(r"\bstyle[:=]\s*([\w\-]+)", content)
+                if m and m.group(1) in STYLES:
+                    return m.group(1)
+    return None
+
+
+def _seed_value(seed: int) -> int:
+    return seed if seed > 0 else int(time.time() * 1000) % (2**31)
+
+
+def _build_txt2img(positive: str, negative: str, settings: dict,
+                   width: int, height: int, seed: int) -> dict:
+    return {
+        "3": {"class_type": "KSampler", "inputs": {
+            "seed": _seed_value(seed),
+            "steps": settings["steps"], "cfg": settings["cfg"],
+            "sampler_name": settings["sampler"], "scheduler": settings["scheduler"],
+            "denoise": 1.0,
+            "model": ["4", 0], "positive": ["6", 0],
+            "negative": ["7", 0], "latent_image": ["5", 0],
+        }},
+        "4": {"class_type": "CheckpointLoaderSimple", "inputs": {"ckpt_name": settings["ckpt"]}},
+        "5": {"class_type": "EmptyLatentImage",
+              "inputs": {"width": width, "height": height, "batch_size": 1}},
+        "6": {"class_type": "CLIPTextEncode", "inputs": {"text": positive, "clip": ["10", 0]}},
+        "7": {"class_type": "CLIPTextEncode", "inputs": {"text": negative, "clip": ["10", 0]}},
+        "8": {"class_type": "VAEDecode", "inputs": {"samples": ["3", 0], "vae": ["4", 2]}},
+        "9": {"class_type": "SaveImage",
+              "inputs": {"filename_prefix": "smartpipe", "images": ["8", 0]}},
+        "10": {"class_type": "CLIPSetLastLayer",
+               "inputs": {"stop_at_clip_layer": -settings["clip_skip"], "clip": ["4", 1]}},
+    }
+
+
+def _build_img2img(positive: str, negative: str, settings: dict,
+                   image_filename: str, denoise: float, seed: int) -> dict:
+    return {
+        "3": {"class_type": "KSampler", "inputs": {
+            "seed": _seed_value(seed),
+            "steps": settings["steps"], "cfg": settings["cfg"],
+            "sampler_name": settings["sampler"], "scheduler": settings["scheduler"],
+            "denoise": denoise,
+            "model": ["4", 0], "positive": ["6", 0],
+            "negative": ["7", 0], "latent_image": ["11", 0],
+        }},
+        "4": {"class_type": "CheckpointLoaderSimple", "inputs": {"ckpt_name": settings["ckpt"]}},
+        "6": {"class_type": "CLIPTextEncode", "inputs": {"text": positive, "clip": ["10", 0]}},
+        "7": {"class_type": "CLIPTextEncode", "inputs": {"text": negative, "clip": ["10", 0]}},
+        "8": {"class_type": "VAEDecode", "inputs": {"samples": ["3", 0], "vae": ["4", 2]}},
+        "9": {"class_type": "SaveImage",
+              "inputs": {"filename_prefix": "smartpipe", "images": ["8", 0]}},
+        "10": {"class_type": "CLIPSetLastLayer",
+               "inputs": {"stop_at_clip_layer": -settings["clip_skip"], "clip": ["4", 1]}},
+        "11": {"class_type": "VAEEncode", "inputs": {"pixels": ["12", 0], "vae": ["4", 2]}},
+        "12": {"class_type": "LoadImage", "inputs": {"image": image_filename}},
+    }
+
+
+def _build_inpaint(positive: str, negative: str, settings: dict,
+                   image_filename: str, mask_text: str,
+                   denoise: float, seed: int) -> dict:
+    return {
+        "3": {"class_type": "KSampler", "inputs": {
+            "seed": _seed_value(seed),
+            "steps": settings["steps"], "cfg": settings["cfg"],
+            "sampler_name": settings["sampler"], "scheduler": settings["scheduler"],
+            "denoise": denoise,
+            "model": ["4", 0], "positive": ["6", 0],
+            "negative": ["7", 0], "latent_image": ["13", 0],
+        }},
+        "4": {"class_type": "CheckpointLoaderSimple", "inputs": {"ckpt_name": settings["ckpt"]}},
+        "6": {"class_type": "CLIPTextEncode", "inputs": {"text": positive, "clip": ["10", 0]}},
+        "7": {"class_type": "CLIPTextEncode", "inputs": {"text": negative, "clip": ["10", 0]}},
+        "8": {"class_type": "VAEDecode", "inputs": {"samples": ["3", 0], "vae": ["4", 2]}},
+        "9": {"class_type": "SaveImage",
+              "inputs": {"filename_prefix": "smartpipe", "images": ["8", 0]}},
+        "10": {"class_type": "CLIPSetLastLayer",
+               "inputs": {"stop_at_clip_layer": -settings["clip_skip"], "clip": ["4", 1]}},
+        "11": {"class_type": "VAEEncode", "inputs": {"pixels": ["12", 0], "vae": ["4", 2]}},
+        "12": {"class_type": "LoadImage", "inputs": {"image": image_filename}},
+        "13": {"class_type": "SetLatentNoiseMask",
+               "inputs": {"samples": ["11", 0], "mask": ["17", 0]}},
+        "14": {"class_type": "SAMModelLoader (segment anything)",
+               "inputs": {"model_name": "sam_hq_vit_h (2.57GB)"}},
+        "15": {"class_type": "GroundingDinoModelLoader (segment anything)",
+               "inputs": {"model_name": "GroundingDINO_SwinT_OGC (694MB)"}},
+        "16": {"class_type": "GroundingDinoSAMSegment (segment anything)",
+               "inputs": {
+                   "sam_model": ["14", 0], "grounding_dino_model": ["15", 0],
+                   "image": ["12", 0], "prompt": mask_text, "threshold": 0.3,
+               }},
+        "17": {"class_type": "GrowMask",
+               "inputs": {"mask": ["16", 1], "expand": 12, "tapered_corners": True}},
+    }
+
+
+_FILE_URL_ID_RE = re.compile(r"/(?:api/v1/)?files/([0-9a-fA-F-]{8,})(?:/content)?")
+
+
+def _file_dict_is_image(f: dict) -> bool:
+    ftype = (f.get("type") or "").lower()
+    fname = (f.get("name") or f.get("filename") or "").lower()
+    return "image" in ftype or fname.endswith((".png", ".jpg", ".jpeg", ".webp"))
+
+
+async def _read_file_dict(f: dict) -> Optional[bytes]:
+    for path_key in ("path", "filepath", "file_path"):
+        path = f.get(path_key)
+        if path:
+            try:
+                with open(path, "rb") as fh:
+                    return fh.read()
+            except OSError:
+                pass
+    candidate_ids = []
+    if f.get("id"):
+        candidate_ids.append(f["id"])
+    url = f.get("url")
+    if url:
+        m = _FILE_URL_ID_RE.search(url)
+        if m:
+            candidate_ids.append(m.group(1))
+    if _OPENWEBUI_RUNTIME:
+        for fid in candidate_ids:
+            try:
+                file_model = await Files.get_file_by_id(fid)
+                if file_model is None:
+                    continue
+                path = getattr(file_model, "path", None)
+                if not path:
+                    meta = getattr(file_model, "meta", None) or {}
+                    path = meta.get("path") if isinstance(meta, dict) else getattr(meta, "path", None)
+                if path:
+                    try:
+                        with open(path, "rb") as fh:
+                            return fh.read()
+                    except OSError:
+                        pass
+            except Exception:
+                pass
+    return None
+
+
+async def _extract_attached_image(files, messages, metadata, session) -> Optional[bytes]:
+    # 1. Inline data URIs
+    for msg in reversed(messages or []):
+        content = msg.get("content") if isinstance(msg, dict) else None
+        if isinstance(content, list):
+            for block in content:
+                if not isinstance(block, dict) or block.get("type") != "image_url":
+                    continue
+                url = (block.get("image_url") or {}).get("url", "")
+                if url.startswith("data:image"):
+                    try:
+                        return base64.b64decode(url.split(",", 1)[1])
+                    except Exception:
+                        pass
+    # 2. messages[].files
+    for msg in reversed(messages or []):
+        if not isinstance(msg, dict):
+            continue
+        for f in (msg.get("files") or []):
+            if isinstance(f, dict) and _file_dict_is_image(f):
+                data = await _read_file_dict(f)
+                if data is not None:
+                    return data
+    # 3. __files__
+    for f in files or []:
+        if isinstance(f, dict) and _file_dict_is_image(f):
+            data = await _read_file_dict(f)
+            if data is not None:
+                return data
+    # 4. DB lookup (assistant-emitted files often only land here)
+    if _OPENWEBUI_RUNTIME and metadata:
+        chat_id = metadata.get("chat_id")
+        if chat_id:
+            try:
+                chat = await Chats.get_chat_by_id(chat_id)
+                chat_data = getattr(chat, "chat", None) if chat else None
+                chat_messages = (chat_data or {}).get("messages", []) if isinstance(chat_data, dict) else []
+                for msg in reversed(chat_messages):
+                    for f in (msg.get("files") or []) if isinstance(msg, dict) else []:
+                        if isinstance(f, dict) and _file_dict_is_image(f):
+                            data = await _read_file_dict(f)
+                            if data is not None:
+                                return data
+            except Exception:
+                pass
+    return None
+
+
+async def _upload_to_comfyui(session, base, raw) -> Optional[str]:
+    name = f"smartpipe_{uuid.uuid4().hex[:12]}.png"
+    form = aiohttp.FormData()
+    form.add_field("image", raw, filename=name, content_type="image/png")
+    form.add_field("overwrite", "true")
+    async with session.post(f"{base}/upload/image", data=form) as resp:
+        if resp.status != 200:
+            return None
+        return (await resp.json()).get("name", name)
+
+
+async def _push_image_to_chat(raw, prefix, request, user_dict, metadata, event_emitter) -> bool:
+    if not (_OPENWEBUI_RUNTIME and request and user_dict and event_emitter):
+        return False
+    try:
+        user = await Users.get_user_by_id(user_dict.get("id"))
+        if not user:
+            return False
+        upload = UploadFile(
+            file=io.BytesIO(raw),
+            filename=f"{prefix}_{uuid.uuid4().hex[:8]}.png",
+            headers={"content-type": "image/png"},
+        )
+        result = upload_file_handler(
+            request=request, file=upload,
+            metadata={"chat_id": (metadata or {}).get("chat_id"),
+                      "message_id": (metadata or {}).get("message_id")},
+            process=False, user=user,
+        )
+        file_item = await result if inspect.iscoroutine(result) else result
+        url = request.app.url_path_for("get_file_content_by_id", id=file_item.id)
+        await event_emitter({
+            "type": "files",
+            "data": {"files": [{"type": "image", "url": url}]},
+        })
+        return True
+    except Exception:
+        return False
+
+
+async def _submit_and_fetch(session, base, workflow, timeout_seconds, emit, settings):
+    SAVE_NODE_ID = "9"
+    client_id = str(uuid.uuid4())
+    async with session.post(
+        f"{base}/prompt", json={"prompt": workflow, "client_id": client_id}
+    ) as resp:
+        if resp.status != 200:
+            return None, f"ComfyUI rejected the prompt: {resp.status} {await resp.text()}"
+        prompt_id = (await resp.json()).get("prompt_id")
+        if not prompt_id:
+            return None, "ComfyUI didn't return a prompt_id."
+
+    await emit(
+        f"Sampling — {settings['sampler']}/{settings['scheduler']}, "
+        f"CFG {settings['cfg']}, {settings['steps']} steps"
+    )
+    deadline = time.time() + timeout_seconds
+    output_images: list = []
+    while time.time() < deadline:
+        await asyncio.sleep(1.5)
+        async with session.get(f"{base}/history/{prompt_id}") as resp:
+            if resp.status != 200:
+                continue
+            history = await resp.json()
+        if prompt_id in history:
+            outputs = history[prompt_id].get("outputs", {}) or {}
+            save_imgs = (outputs.get(SAVE_NODE_ID) or {}).get("images", [])
+            if save_imgs:
+                output_images.extend(save_imgs)
+            if not output_images:
+                for node_out in outputs.values():
+                    output_images.extend(node_out.get("images", []))
+            if output_images:
+                break
+
+    if not output_images:
+        return None, f"Timed out after {timeout_seconds}s waiting for image."
+
+    img = output_images[0]
+    params = {
+        "filename": img["filename"],
+        "subfolder": img.get("subfolder", ""),
+        "type": img.get("type", "output"),
+    }
+    async with session.get(f"{base}/view", params=params) as resp:
+        if resp.status != 200:
+            return None, f"Failed to fetch image: {resp.status}"
+        return await resp.read(), None
+
+
+def _extract_user_text(body: dict) -> str:
+    """Pull the latest user message's text content."""
+    messages = body.get("messages", [])
+    for msg in reversed(messages):
+        if not isinstance(msg, dict) or msg.get("role") != "user":
+            continue
+        content = msg.get("content")
+        if isinstance(content, str):
+            return content.strip()
+        if isinstance(content, list):
+            parts = []
+            for block in content:
+                if isinstance(block, dict) and block.get("type") == "text":
+                    parts.append(block.get("text", ""))
+            return " ".join(parts).strip()
+    return ""
+
+
+class Pipe:
+    class Valves(BaseModel):
+        COMFYUI_BASE_URL: str = Field(
+            default="http://comfyui:8188",
+            description="ComfyUI server URL reachable from the open-webui container.",
+        )
+        TIMEOUT_SECONDS: int = Field(default=600)
+        DEFAULT_WIDTH: int = Field(default=1024)
+        DEFAULT_HEIGHT: int = Field(default=1024)
+        DEFAULT_DENOISE_IMG2IMG: float = Field(default=0.7)
+        DEFAULT_DENOISE_INPAINT: float = Field(default=1.0)
+        FORCE_STYLE: str = Field(
+            default="",
+            description="Override style routing. Empty = auto-route. Set to "
+                        "one of: photo, juggernaut, pony, general, "
+                        "furry-nai, furry-noob, furry-il.",
+        )
+
+    def __init__(self):
+        self.valves = self.Valves()
+        self.id = "image-studio-pipe"
+        self.name = "Image Studio (Pipe)"
+
+    async def pipe(
+        self,
+        body: dict,
+        __user__: Optional[dict] = None,
+        __request__=None,
+        __metadata__: Optional[dict] = None,
+        __event_emitter__: Optional[Callable[[dict], Awaitable[None]]] = None,
+    ) -> str:
+        user_text = _extract_user_text(body)
+        if not user_text:
+            return "Type a message describing the image you want."
+
+        async def emit(msg: str, done: bool = False):
+            if __event_emitter__:
+                await __event_emitter__({
+                    "type": "status",
+                    "data": {"description": msg, "done": done},
+                })
+
+        # Style: explicit valve override > inherited from prior assistant
+        # message > keyword detection on user text > default.
+        chosen = (
+            self.valves.FORCE_STYLE.strip()
+            or _inherited_style(body.get("messages"))
+            or _route_style(user_text)
+        )
+        if chosen not in STYLES:
+            chosen = DEFAULT_STYLE
+        settings = STYLES[chosen]
+
+        base = self.valves.COMFYUI_BASE_URL.rstrip("/")
+        positive = f"{settings['prefix']}{user_text}"
+        negative = settings["negative"]
+
+        async with aiohttp.ClientSession() as session:
+            await emit("Looking for attached image…")
+            source_bytes = await _extract_attached_image(
+                None, body.get("messages"), __metadata__, session,
+            )
+
+            if source_bytes is None:
+                # No image → txt2img
+                await emit(f"Generating ({chosen})")
+                workflow = _build_txt2img(
+                    positive, negative, settings,
+                    self.valves.DEFAULT_WIDTH, self.valves.DEFAULT_HEIGHT, 0,
+                )
+                tag = "gen"
+            else:
+                # Image present → upload, then inpaint or img2img
+                uploaded = await _upload_to_comfyui(session, base, source_bytes)
+                if not uploaded:
+                    return "Failed to upload source image to ComfyUI."
+
+                mask_text = _detect_mask_text(user_text)
+                if mask_text:
+                    await emit(
+                        f"Inpainting ({chosen}, mask='{mask_text}', "
+                        f"denoise={self.valves.DEFAULT_DENOISE_INPAINT})"
+                    )
+                    workflow = _build_inpaint(
+                        positive, negative, settings, uploaded, mask_text,
+                        self.valves.DEFAULT_DENOISE_INPAINT, 0,
+                    )
+                    tag = f"edit (inpaint: {mask_text})"
+                else:
+                    await emit(
+                        f"Editing ({chosen}, "
+                        f"denoise={self.valves.DEFAULT_DENOISE_IMG2IMG})"
+                    )
+                    workflow = _build_img2img(
+                        positive, negative, settings, uploaded,
+                        self.valves.DEFAULT_DENOISE_IMG2IMG, 0,
+                    )
+                    tag = "edit (img2img)"
+
+            raw, err = await _submit_and_fetch(
+                session, base, workflow, self.valves.TIMEOUT_SECONDS, emit, settings,
+            )
+        if err:
+            return err
+
+        await _push_image_to_chat(
+            raw, "smartpipe", __request__, __user__, __metadata__, __event_emitter__,
+        )
+        await emit(f"Done — {chosen}", done=True)
+
+        # Single-line plain-English follow-up. Emit the style as
+        # "style: <name>" so the inheritance helper can find it next turn.
+        return f"Done — style: {chosen}, {tag}."
Author	SHA1	Message	Date
William Gill	c07e962cae	Image tools: migrate to OWUI 0.9.0 async model accessors Open WebUI 0.9.0 made every model-class accessor (Users.get_user_by_id, Chats.get_chat_by_id, Files.get_file_by_id, …) a coroutine. Both tools were still calling them synchronously, so the calls returned coroutines instead of model objects; the first downstream attribute access threw, the bare `except Exception: return False` swallowed it, and uploads silently fell through to the data-URI fallback. The data-URI markdown rendered during streaming but didn't survive post-stream commit, which looked like "image flashes in, then disappears." Add await to the six call sites; promote `_read_file_dict` to async since it now contains an await; restore `_push_image_to_chat` to the canonical `files` event so the file-attachment chrome (thumbnail + download) comes back. This supersedes commit `d034700`, which mis-diagnosed the symptom as a virtualization regression and switched to a `message`-event markdown workaround. The workaround didn't help (same flash-and-vanish) because the upload pre-check still failed for the same async-migration reason and the data-URI fallback path still ran. smart_image_gen.py 0.7.9 -> 0.7.10 smart_image_pipe.py 0.1.1 -> 0.1.2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 06:16:02 -05:00
William Gill	d034700af9	Image tools: work around OWUI 0.9.x files-event regression Open WebUI 0.9.0 introduced chat-history virtualization that unmounts off-screen assistant messages and reconstructs them from persisted shape; `files` attached mid-stream by a tool don't survive the round-trip — the image flashes in during streaming and disappears the moment the message commits. Both image tools now upload via Open WebUI's file store as before but surface the result as a markdown image injected into the assistant message via a `message` event, which is part of the persisted shape and renders reliably across remounts. Trade-off: loses the file-attachment chrome (thumbnail + download button). Each tool has a TODO marking the swap site with the original `files` payload inlined for one-line revert once upstream fixes the regression. smart_image_gen.py 0.7.8 -> 0.7.9 smart_image_pipe.py 0.1.0 -> 0.1.1 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 06:05:12 -05:00
57_Wolve	28f370a80b	Update deployments/ai-stack/openwebui-tools/smart_image_gen.py	2026-04-20 17:08:32 +00:00
William Gill	02a4bece5d	Ollama: keep loaded models resident until evicted (KEEP_ALIVE=-1) Was 30m, which evicts after 30 minutes of inactivity and forces a reload penalty on the next request. Setting -1 holds models in VRAM indefinitely; MAX_LOADED_MODELS=3 caps how many can stay resident simultaneously (vs the previous 2). Tune MAX higher if you're rotating between more than three models AND your GPU has the VRAM for it — comment in the compose explains the trade-off. For the live srvno.de stack: OLLAMA_KEEP_ALIVE=-1 takes effect on the next `docker compose up -d ollama`. Loaded models survive the restart only if they're re-requested before swap-out anyway. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 19:25:42 -05:00
William Gill	a1af88a632	smart_image_gen v0.7.8: per-job filename_prefix + source-image diagnostic User reported observing a wrong image returned. Two hardenings: 1. _job_prefix() generates a per-submission filename_prefix ('smartedit_<10hex>', 'smartinpaint_<10hex>', 'smartgen_<10hex>') so SaveImage outputs from concurrent jobs sit in their own namespace and ComfyUI's auto-incrementing counter can never produce filenames that overlap across jobs. With a shared prefix, if a queued job's history-fetch ever raced past its own SaveImage record there was a theoretical (if unlikely) path to picking up another job's _00001_.png. Per-job prefix kills that vector. 2. edit_image now emits the source image's SHA-1 and byte count in a status event before uploading to ComfyUI. If a future 'wrong image' report comes in, that hash should match the prior generation's output — if it doesn't, we know _extract_attached_image picked up the wrong source rather than ComfyUI returning the wrong file. Hashlib import is local so the module's import surface stays clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 18:22:45 -05:00
William Gill	d8c8421361	Image Studio: lock in working config — Native + enable_thinking=false User confirmed end-to-end working stack: - base_model_id: huihui_ai/qwen3-vl-abliterated:8b - function_calling: native (gives 'View Result from edit_image' blocks, structured tool call traces) - custom_params: tool_choice: required (forces tool call every turn) enable_thinking: false (server-side disable; abliterated Qwen ignores the /no_think system prompt directive — when thinking is on the tool call leaks inside a thinking block as text) Updated image_studio.json + the markdown setup table + the 'Qwen 3.x quirk' explainer to match. The /no_think line in the system prompt stays in for non-abliterated Qwen variants but is now documented as best-effort backup; enable_thinking=false is the authoritative kill-switch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 18:12:24 -05:00
William Gill	f5a222fe6f	Image Studio: default base model → huihui_ai/qwen3-vl-abliterated:8b User confirmed this model works end-to-end after the multi-base-model search. Settled on it because Qwen 3 VL's fine-tune lineage isn't damaged by abliteration the way Qwen 3.5's is, so it both calls tools reliably AND won't refuse to dispatch on NSFW edit prompts. Updated: - image_studio.json base_model_id → huihui_ai/qwen3-vl-abliterated:8b - init-models.sh: pulls the abliterated VL model in place of the non-working standard qwen3.5:9b - image_studio.md: setup table base-model row + vision-section 'why this and not the alternatives' explanation function_calling stays default and tool_choice required. Operator can flip to native + drop tool_choice once they've verified the new base behaves with structured tool calls (which would also remove the need for a separate Task Model for title generation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 17:56:16 -05:00
William Gill	ec6108888a	smart_image_gen v0.7.7: enforce style inheritance for edit_image Vision-capable LLMs misclassify rendered subjects when picking a style — observed: model called juggernaut for an edit on a furry-il generation because the rendered character looked 'photoreal-ish' to its vision encoder. Each visual judgment is independent so styles flip mid-chat. Flipped resolution order in edit_image so inheritance from the prior generate_image / edit_image call DOMINATES the LLM's explicit style arg. The LLM's choice only wins when there's nothing to inherit (first edit in a chat, fresh user upload). Workaround for legitimate style changes is starting a new chat. System prompt updated to match: tells the LLM that style inheritance is enforced, that passing style on follow-up calls is ignored, and that user requests for style change require a new chat. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 17:53:32 -05:00
William Gill	20d4bd5b72	Image Studio: switch base model to qwen3.5:9b (non-abliterated) The abliterated 9B was the source of the tool-call format mangling (both Native XML leaks and Default Python-syntax leaks). Standard qwen3.5:9b is the same family, same 9B size (6.6 GB), vision-capable and native tool calling actually works. The image content uncensored-ness was always going to come from the SDXL checkpoints in ComfyUI — the LLM is just a dispatcher. Picking a well-behaved tool-caller for that role doesn't compromise output content. Updated: - image_studio.json base_model_id → qwen3.5:9b - init-models.sh: pulls qwen3.5:9b as a standard registry pull, in addition to the existing abliterated 9B (which stays for other chat models) - image_studio.md setup table + vision section explaining why we chose standard over abliterated for the dispatcher role function_calling stays as 'default' and tool_choice as 'required' for now — they don't hurt with a reliable tool-caller and operators can flip back to native + drop tool_choice once they verify it works for them (which also removes the need for a separate Task Model for title generation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 17:09:42 -05:00
William Gill	1ae451ad5f	Add smart_image_pipe.py — deterministic Pipe for image gen / edit / inpaint Registers as 'Image Studio (Pipe)' in Open WebUI's chat-model dropdown. No LLM in the loop — the Pipe parses the user's message with regex, finds attached images via the same multi-source extractor as the Tool, routes deterministically to txt2img / img2img / inpaint, calls ComfyUI, returns the image. Bypasses the abliterated Qwen 3.5 × Open WebUI tool-call format interop bug entirely. Style resolution: explicit FORCE_STYLE valve > inherited from prior assistant message ('style: X' marker) > keyword regex on user text > default (furry-il). Inpaint trigger: regex picks up phrasings like 'change the X', 'remove the Y', 'make the Z bigger/red/etc' and pulls the noun phrase as mask_text. No match → full img2img. Reuses the same per-style settings, prefix dialects, negatives, GrowMask feathering, file extraction (with chat-DB fallback) and files-event push as smart_image_gen.py — code is duplicated rather than shared because Open WebUI loads each plugin file standalone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:58:01 -05:00
William Gill	011fade024	Image Studio prompt: forbid post-tool echo of the function call User saw the LLM's chat response include a literal 'edit_image(prompt="...", mask_text="...", style="furry-il", denoise=0.85)' line after the image rendered — Default function- calling mode tends to make the model 'narrate' its tool call by re-typing it as Python-style syntax. Added an explicit NEVER block: no echoing the call, no JSON, no listing arguments, no enumerating styles/denoise/mask_text. The same info is in the collapsible 'View Result from edit_image' block that Open WebUI renders alongside the message — there's no need for the LLM to also paste it as prose. Follow-up text is for human conversation, not bookkeeping. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:41:51 -05:00
William Gill	63917709c1	smart_image_gen v0.7.6: feather inpaint mask edges via GrowMask The raw SAM mask is a hard binary edge — KSampler repaints right up to it, and SDXL has no surrounding-pixel context inside the mask to blend with. Result: the inpainted region looks pasted-on with visible seams (the artifact the user reported on the werewolf-groin edit). Inserted a stock GrowMask node (id 17) between GroundingDinoSAMSegment and SetLatentNoiseMask: - expand=12 grows the mask outward by 12 px so the new content overlaps a strip of original pixels for blending - tapered_corners=True softens the edge so the noise transition isn't a step function GrowMask is built into stock ComfyUI; no extra custom node install. KSampler still uses the caller-supplied denoise (default 1.0 in inpaint mode). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:40:39 -05:00
William Gill	1ed2e7293e	Image Studio: ship function_calling=default — Native leaks Qwen 3.5 XML Qwen 3.5 abliterated emits its native tool-call format (<function=...><parameter=...>) wrapped in <tool_call> tags that the current Open WebUI / Ollama parser does not reliably round-trip — the XML leaks to chat as plain text instead of executing. Switching the preset to Function Calling: Default, which uses Open WebUI's own prompt-injection wrapper, fires the tool reliably. Native is documented as the right choice only when the operator has swapped the base model to one with proven OWUI-side parser support (mistral-nemo:12b, qwen2.5vl:7b). For the shipped Qwen 3.5 abliterated default, Default is the working setting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:36:05 -05:00
William Gill	18a205d69d	smart_image_gen v0.7.5: fix black-image bug — fetch from SaveImage explicitly _submit_and_fetch was iterating history[prompt_id]['outputs'].values() and grabbing the first image it saw. With the inpaint workflow that includes nodes other than SaveImage that emit IMAGE outputs (the GroundingDinoSAMSegment node returns an overlay/mask-applied image in addition to the mask), and dict iteration order is undefined — sometimes we'd return the overlay (which can render mostly black) instead of the actual SaveImage result. Fix: prefer outputs from the SaveImage node id ('9' in every workflow the tool builds) explicitly. Fall back to scanning all outputs only if SaveImage didn't appear (workflow drift, manual edit, etc). User reported seeing the correct inpaint in ComfyUI's native UI but black in chat — this is the gap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:30:04 -05:00
William Gill	0fa8040251	Eliminate first-inpaint timeout: preseed SAM/GroundingDINO + 600s default Two changes to address the timeout-and-retry loop the user hit on the first edit_image call: 1. comfyui-init-models.sh now fetches the three weights inpaint needs into /models/sams and /models/grounding-dino: - sam_hq_vit_h.pth (~2.5 GB) - groundingdino_swint_ogc.pth (~700 MB) - GroundingDINO_SwinT_OGC.cfg.py (~1 KB) Without preseeding these auto-download on first inpaint, which takes minutes and times out the tool call. The mkdir line gets the new subdirs added too. 2. Tool TIMEOUT_SECONDS valve default bumped 240s → 600s as defense-in-depth — even with weights preseeded, BERT-base auto-downloads via transformers on first GroundingDINO load (~30s) and a slow KSampler on a contended GPU can push past 4 minutes occasionally. Steady-state runs still finish in under a minute; the valve only matters for first-call latency. After comfyui-model-init re-runs (`docker compose up -d comfyui-model-init`), first inpaint should be near-instant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 15:39:23 -05:00
William Gill	e77666ea0f	Image Studio docs: require setting a separate Task Model after install tool_choice: required (the thing that makes Image Studio reliably fire its tools) also blocks Open WebUI's background text-only calls — title generation, tag suggestions, autocomplete — because the model is forced to produce a tool call instead of text. Result: chats stay named 'New Chat' and tag suggestions go silent. Documented the fix in two places: - image_studio.md: dedicated 'Set a separate Task Model (required after install)' section explaining the cause and the fix path. - deployment README §9: short follow-up note pointing at it so operators don't miss it during initial setup. The fix is purely Open WebUI configuration — no code change. Pick any non-Image-Studio model already pulled (mistral-nemo:12b is the obvious default) for the Task Model slot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 15:37:21 -05:00
William Gill	6700f6ce33	smart_image_gen v0.7.3: edit_image inherits style from prior tool call User reported edit_image picking 'juggernaut' (photoreal) for an edit on a furry image — the LLM didn't carry context, and the tool's fallback _route_style only sees the edit instruction text, which for neutral edits ('bigger', 'glowing eyes') has no furry keywords. Fix in two places: 1. Tool: _inherited_style scans __messages__ in reverse for prior generate_image / edit_image tool calls and returns the style arg they used. edit_image now resolves: explicit style → inherited → keyword fallback. Deterministic, no LLM cooperation needed for follow-up edits on previously-generated images. 2. System prompt: explicit three-step style resolution for edit_image. Generated by you → omit style and auto-inherit. Uploaded by user → INSPECT visually and pick a matching style (the LLM is the only thing with vision; the tool can't see pixels). Then keep that style for subsequent edits. Both paths matter — the tool fix handles the common case deterministically, the prompt fix handles the upload case where there's nothing to inherit from. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 15:31:57 -05:00
William Gill	def27087c1	Make every image tag in compose pinnable via .env Floating tags (`latest`, `main`) made deploys non-deterministic — a container recreate could pull a newer Open WebUI, Ollama, or Anubis at any time. Wrapped every image: src in a ${VAR:-default} substitution and surfaced the full set in .env.example with a header explaining where to find current versions and bumped COMFYUI_IMAGE_TAG default to 0.2.1 (the just-tagged version with the transformers pin). Vars added: CADDY_TAG, OLLAMA_TAG, OPEN_WEBUI_TAG, ALPINE_TAG, ANUBIS_TAG (COMFYUI_IMAGE_TAG already existed). Defaults match the previous floating-tag behaviour for ones I'm not confident which specific version to pin (Ollama, Open WebUI, Anubis) — operator should update those to verified versions for production deploys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 15:27:08 -05:00