5 Commits

Author SHA1 Message Date
2cecf77981 Pin transformers <5 — comfyui_segment_anything's GroundingDINO needs it
All checks were successful
release / Build & Push Docker Image (push) Successful in 1m12s
transformers 5.0 removed BertModel.get_head_mask (it was on the legacy
4.x API). comfyui_segment_anything's GroundingDINO bertwarper.py still
calls bert_model.get_head_mask in __init__, so first inpaint crashes
with AttributeError. Pinned transformers>=4.40,<5 in two places:

  - Dockerfile: applied AFTER the custom node's requirements.txt
    install so it wins on a fresh image build.
  - install-custom-node-deps.sh entrypoint: re-applied at every
    container start so any future custom-node install (via
    ComfyUI-Manager or volume clone) that pulls a newer transformers
    transitively gets pinned back into the working range.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:21:21 -05:00
f26dfbee02 smart_image_gen v0.7.2: chat-DB fallback + diagnostic 'no image' msg
If __messages__ doesn't include the assistant's prior file attachments
(which is what the screenshot is showing), the new fallback queries
the chat by id via Chats.get_chat_by_id and walks every persisted
message for files. Open WebUI's socket handler always upserts files
onto the assistant message via {'files': files} so this path is
authoritative.

The 'No image found' return now includes diagnostic counts —
__files__, __messages__, messages_with_files, chat_id_present,
openwebui_runtime — so subsequent failures actually show what the
tool saw instead of being opaque.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:07:55 -05:00
06433d3815 smart_image_gen v0.7.1: rename edit_image arg + parse file id from URL
Two bugs in one screenshot:

1. LLM called edit_image(prompt=..., ...) but the signature was
   edit_image(edit_instruction=..., ...) — mismatch, missing-arg
   crash. Renamed the first param to `prompt` so both tools have a
   matching, predictable name. System prompt updated with an explicit
   'do not invent edit_instruction' line for stubborn models.

2. After fix #1, edit_image still couldn't find the prior generated
   image because Open WebUI assistant-message file attachments only
   carry {type, url} (no id, no path). _read_file_dict now also
   greps the file id out of /api/v1/files/<uuid>/content URLs and
   feeds it to Files.get_file_by_id. Verified pattern matches
   absolute URLs (https://llm-1.srvno.de/api/v1/files/.../content).

System prompt also now says 'including images you previously
generated in this chat' to nudge the LLM to pick up assistant
outputs as edit candidates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:58:40 -05:00
780ce42711 Image Studio: move tool_choice into params.custom_params (correct field)
Previous commit put tool_choice at the top level of params. Open WebUI
drops that silently — apply_model_params_to_body has a whitelist of
mapped param names (temperature, top_p, etc.) and tool_choice isn't
on it. The Custom Parameters UI section also only iterates
params.custom_params, which is why the value didn't appear there
after importing the preset.

Correct location is the custom_params sub-dict, where values go
through json.loads before being merged into the outgoing chat
completion body. 'required' stays a string after the failed
json.loads and ends up exactly where the OpenAI / Ollama tools spec
expects it.

Source: src/lib/components/chat/Settings/Advanced/AdvancedParams.svelte
(UI binding) and backend/open_webui/utils/payload.py (serialization).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:50:34 -05:00
f6f5690fcd smart_image_gen v0.7: edit_image finds previously-emitted images
Bug: after generate_image surfaced an image via the files event, the
next edit_image call returned 'No image found in the chat'. The image
was attached to the assistant's message, but _extract_attached_image
only scanned the user's __files__ param and image_url content blocks
on user messages — it never looked at messages.files for any role.

Fix: rewrite extraction to scan messages[].files in reverse for ALL
roles, so an assistant-emitted image from a prior tool call is found
the same way as a user-attached upload. Use Open WebUI's internal
Files.get_file_by_id when the file dict has an id, so we get raw
bytes from disk without going through the auth-protected
/api/v1/files/{id}/content endpoint. Old path-key and URL-fetch
paths kept as fallbacks.

Refactored shared helpers _file_dict_is_image and _read_file_dict
out of the loop to keep the search logic readable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:46:10 -05:00
5 changed files with 178 additions and 38 deletions

View File

@@ -57,9 +57,15 @@ RUN git clone --depth 1 https://github.com/ltdrdata/ComfyUI-Manager.git \
# mask_text parameter). Model weights auto-download on first use into
# /opt/comfyui/models/{sams,grounding-dino}/ — first inpaint takes ~3 GB of
# downloads, subsequent runs are instant.
#
# Transformers must stay <5: GroundingDINO inside this node calls
# BertModel.get_head_mask, which transformers 5.0 silently removed. The pin
# is applied AFTER the requirements install so it overrides anything the
# upstream requirements.txt would have pulled.
RUN git clone --depth 1 https://github.com/storyicon/comfyui_segment_anything.git \
${COMFYUI_HOME}/custom_nodes/comfyui_segment_anything && \
pip install -q -r ${COMFYUI_HOME}/custom_nodes/comfyui_segment_anything/requirements.txt
pip install -q -r ${COMFYUI_HOME}/custom_nodes/comfyui_segment_anything/requirements.txt && \
pip install -q "transformers>=4.40,<5"
# Entrypoint wrapper — auto-installs requirements.txt for any custom_node
# present at startup (covers Manager-installed nodes and nodes cloned

View File

@@ -4,11 +4,13 @@
"base_model_id": "huihui_ai/qwen3.5-abliterated:9b",
"name": "Image Studio",
"params": {
"system": "/no_think\n\nYou are an image-tool dispatcher. You do not respond in prose. Every user message MUST result in exactly one tool call.\n\nROUTING:\n- If the user attached an image → call edit_image\n- Otherwise → call generate_image\n\nFire the tool on the FIRST message, with no preamble. Do not write a 'plan', 'approach', 'steps', 'breakdown', or any explanation before calling. Do not ask clarifying questions. Do not say what you are about to do. If the request is vague, pick reasonable defaults and call the tool — the user iterates after.\n\nSTYLES (pick one):\n photo photorealistic photo / portrait / cinematic\n juggernaut alternate photoreal — sharper, more saturated\n pony anime, cartoon, manga, stylised illustration\n general catch-all when nothing else fits\n furry-nai anthropomorphic, NAI-trained mix\n furry-noob anthropomorphic, NoobAI base\n furry-il anthropomorphic, Illustrious base (default for any furry/anthro request)\n\nedit_image has TWO MODES — pick based on whether the change is local or global:\n- LOCAL change (\"change the ball to a basketball\", \"add a hat to the dog\", \"remove the bird\", \"recolor the car red\") → set `mask_text` to a brief noun phrase naming the region (\"the ball\", \"the dog\", \"the bird\", \"the car\"). Only that region is repainted; rest stays pixel-perfect.\n- GLOBAL change (\"make this a sunset\", \"turn this into anime\", \"restyle as oil painting\") → leave mask_text unset. The whole image is reimagined.\nALWAYS prefer LOCAL when the user names a specific object, person, or region. GLOBAL is only for whole-image style/lighting transformations.\n\nDenoise:\n- LOCAL (mask_text set): default 1.0. Drop to 0.60.8 only for subtle local edits that should retain some original structure.\n- GLOBAL (no mask_text): default 0.7. Use 0.30.5 for subtle restyle, 0.851.0 for radical reimagining.\n\nPick style for the DESIRED OUTPUT, not the input image.\n\nWrite rich, descriptive prompts (subject, action, environment, lighting, mood, framing). Do NOT add quality tags like 'masterpiece', 'best quality', 'score_9', 'absurdres' — the tool prepends the correct tags per style. Do NOT set sampler, CFG, steps, scheduler — the tool picks them.\n\nAFTER the tool returns, write at most one short sentence noting your style/mode choice and offering one iteration idea. The image is already shown to the user; do not describe it.",
"system": "/no_think\n\nYou are an image-tool dispatcher. You do not respond in prose. Every user message MUST result in exactly one tool call.\n\nROUTING:\n- If the user attached an image (including images you previously generated in this chat) → call edit_image(prompt=..., ...)\n- Otherwise → call generate_image(prompt=..., ...)\nBoth tools take `prompt` as the first argument — same name on both. Do NOT invent `edit_instruction`.\n\nFire the tool on the FIRST message, with no preamble. Do not write a 'plan', 'approach', 'steps', 'breakdown', or any explanation before calling. Do not ask clarifying questions. Do not say what you are about to do. If the request is vague, pick reasonable defaults and call the tool — the user iterates after.\n\nSTYLES (pick one):\n photo photorealistic photo / portrait / cinematic\n juggernaut alternate photoreal — sharper, more saturated\n pony anime, cartoon, manga, stylised illustration\n general catch-all when nothing else fits\n furry-nai anthropomorphic, NAI-trained mix\n furry-noob anthropomorphic, NoobAI base\n furry-il anthropomorphic, Illustrious base (default for any furry/anthro request)\n\nedit_image has TWO MODES — pick based on whether the change is local or global:\n- LOCAL change (\"change the ball to a basketball\", \"add a hat to the dog\", \"remove the bird\", \"recolor the car red\") → set `mask_text` to a brief noun phrase naming the region (\"the ball\", \"the dog\", \"the bird\", \"the car\"). Only that region is repainted; rest stays pixel-perfect.\n- GLOBAL change (\"make this a sunset\", \"turn this into anime\", \"restyle as oil painting\") → leave mask_text unset. The whole image is reimagined.\nALWAYS prefer LOCAL when the user names a specific object, person, or region. GLOBAL is only for whole-image style/lighting transformations.\n\nDenoise:\n- LOCAL (mask_text set): default 1.0. Drop to 0.60.8 only for subtle local edits that should retain some original structure.\n- GLOBAL (no mask_text): default 0.7. Use 0.30.5 for subtle restyle, 0.851.0 for radical reimagining.\n\nPick style for the DESIRED OUTPUT, not the input image.\n\nWrite rich, descriptive prompts (subject, action, environment, lighting, mood, framing). Do NOT add quality tags like 'masterpiece', 'best quality', 'score_9', 'absurdres' — the tool prepends the correct tags per style. Do NOT set sampler, CFG, steps, scheduler — the tool picks them.\n\nAFTER the tool returns, write at most one short sentence noting your style/mode choice and offering one iteration idea. The image is already shown to the user; do not describe it.",
"temperature": 0.5,
"top_p": 0.9,
"function_calling": "native",
"tool_choice": "required"
"custom_params": {
"tool_choice": "required"
}
},
"meta": {
"profile_image_url": "/static/favicon.png",

View File

@@ -65,8 +65,11 @@ You are an image-tool dispatcher. You do not respond in prose. Every
user message MUST result in exactly one tool call.
ROUTING:
- If the user attached an image → call edit_image
- Otherwise → call generate_image
- If the user attached an image (including images you previously
generated in this chat) → call edit_image(prompt=..., ...)
- Otherwise → call generate_image(prompt=..., ...)
Both tools take `prompt` as the first argument — same name on both.
Do NOT invent `edit_instruction`.
Fire the tool on the FIRST message, with no preamble. Do not write a
'plan', 'approach', 'steps', 'breakdown', or any explanation before

View File

@@ -1,7 +1,7 @@
"""
title: Smart Image Generator & Editor (ComfyUI)
author: ai-stack
version: 0.6.0
version: 0.7.2
description: Generate or edit images via ComfyUI with automatic SDXL
checkpoint routing. Two methods — generate_image (txt2img) and
edit_image (img2img on the user's most recently attached image). The
@@ -34,6 +34,8 @@ from pydantic import BaseModel, Field
# falls back to emitting a markdown data-URI message.
try:
from fastapi import UploadFile
from open_webui.models.chats import Chats
from open_webui.models.files import Files
from open_webui.models.users import Users
from open_webui.routers.files import upload_file_handler
@@ -338,20 +340,93 @@ def _build_img2img(positive: str, negative: str, settings: dict,
}
def _file_dict_is_image(f: dict) -> bool:
ftype = (f.get("type") or "").lower()
fname = (f.get("name") or f.get("filename") or "").lower()
return "image" in ftype or fname.endswith((".png", ".jpg", ".jpeg", ".webp"))
_FILE_URL_ID_RE = re.compile(r"/(?:api/v1/)?files/([0-9a-fA-F-]{8,})(?:/content)?")
def _read_file_dict(f: dict) -> Optional[bytes]:
"""
Try to read raw bytes for one file dict. Tries in order:
1. Local filesystem path keys (covers user uploads with `path`).
2. Open WebUI's Files.get_file_by_id with f["id"] (covers files
the user uploaded via the file API).
3. Same lookup with the id parsed out of f["url"] (covers
assistant-emitted files where the message attachment is just
{"type":"image","url":"/api/v1/files/<uuid>/content"} —
no id field, no path field, but the URL has the id).
"""
for path_key in ("path", "filepath", "file_path"):
path = f.get(path_key)
if path:
try:
with open(path, "rb") as fh:
return fh.read()
except OSError:
pass
candidate_ids = []
if f.get("id"):
candidate_ids.append(f["id"])
url = f.get("url")
if url:
m = _FILE_URL_ID_RE.search(url)
if m:
candidate_ids.append(m.group(1))
if _OPENWEBUI_RUNTIME:
for fid in candidate_ids:
try:
file_model = Files.get_file_by_id(fid)
if file_model is None:
continue
path = getattr(file_model, "path", None)
if not path:
meta = getattr(file_model, "meta", None) or {}
if isinstance(meta, dict):
path = meta.get("path")
else:
path = getattr(meta, "path", None)
if path:
try:
with open(path, "rb") as fh:
return fh.read()
except OSError:
pass
except Exception:
pass
return None
async def _extract_attached_image(
files: Optional[list],
messages: Optional[list],
metadata: Optional[dict],
session: aiohttp.ClientSession,
) -> Optional[bytes]:
"""
Find the most recent image the user attached to the chat. Tries three
sources in order: (1) base64 data URIs in `image_url` content blocks
of the recent messages (works for vision-capable models), (2) a local
filesystem path on the file dict (open-webui stores uploads under
/app/backend/data/uploads/), (3) the file's url field, fetched over
HTTP. Returns raw image bytes, or None if nothing matched.
Find the most recent image in the chat — including images previously
emitted by this tool itself. Search order (most recent first):
1. Inline base64 data URIs in `image_url` content blocks of recent
messages (vision-model uploads, paste-from-clipboard).
2. Files attached to messages in the conversation, scanned in
REVERSE so the newest image wins. This covers two cases:
a. Files the user just attached (current user message).
b. Files the assistant emitted via prior `generate_image` /
`edit_image` calls (attached to assistant messages by the
`files` event in _push_image_to_chat).
3. The __files__ tool param as a final fallback (some Open WebUI
versions pass user uploads here instead of on the message).
4. Best-effort URL fetch on any leftover file dict (likely fails
on auth-protected endpoints — last resort).
"""
# Messages: standard OpenAI image_url content blocks.
# 1. Inline data URIs on recent messages.
for msg in reversed(messages or []):
content = msg.get("content") if isinstance(msg, dict) else None
if isinstance(content, list):
@@ -365,27 +440,62 @@ async def _extract_attached_image(
except Exception:
pass
# Files: try local path, then URL.
# 2. Files on messages, newest first.
for msg in reversed(messages or []):
if not isinstance(msg, dict):
continue
msg_files = msg.get("files")
if not isinstance(msg_files, list):
continue
for f in msg_files:
if not isinstance(f, dict) or not _file_dict_is_image(f):
continue
data = _read_file_dict(f)
if data is not None:
return data
# 3. __files__ param (current user upload, sometimes only here).
for f in files or []:
if not isinstance(f, dict):
continue
ftype = (f.get("type") or "").lower()
fname = (f.get("name") or f.get("filename") or "").lower()
is_image = "image" in ftype or fname.endswith((".png", ".jpg", ".jpeg", ".webp"))
if not is_image:
if not isinstance(f, dict) or not _file_dict_is_image(f):
continue
data = _read_file_dict(f)
if data is not None:
return data
for path_key in ("path", "filepath", "file_path"):
path = f.get(path_key)
if path:
try:
with open(path, "rb") as fh:
return fh.read()
except OSError:
pass
# 4. Pull the chat from the database directly. Open WebUI persists
# `files` on every message via the upsert in socket/main.py — so even
# if __messages__ doesn't hydrate the assistant-emitted attachments,
# the chat record does. This is the strongest fallback.
if _OPENWEBUI_RUNTIME and metadata:
chat_id = metadata.get("chat_id")
if chat_id:
try:
chat = Chats.get_chat_by_id(chat_id)
chat_data = getattr(chat, "chat", None) if chat else None
chat_messages = (chat_data or {}).get("messages", []) if isinstance(chat_data, dict) else []
for msg in reversed(chat_messages):
if not isinstance(msg, dict):
continue
msg_files = msg.get("files") or []
for f in msg_files:
if not isinstance(f, dict) or not _file_dict_is_image(f):
continue
data = _read_file_dict(f)
if data is not None:
return data
except Exception:
pass
url = f.get("url")
if url:
# 5. Last-resort URL fetch (no auth — only works for public endpoints).
for source in [files or []] + [
(msg.get("files") or []) for msg in reversed(messages or []) if isinstance(msg, dict)
]:
for f in source:
if not isinstance(f, dict) or not _file_dict_is_image(f):
continue
url = f.get("url")
if not url:
continue
full = url if url.startswith("http") else f"http://localhost:8080{url}"
try:
async with session.get(full) as resp:
@@ -645,7 +755,7 @@ class Tools:
async def edit_image(
self,
edit_instruction: str,
prompt: str,
style: Optional[StyleName] = None,
mask_text: Optional[str] = None,
denoise: Optional[float] = None,
@@ -689,7 +799,7 @@ class Tools:
Pick `style` for the DESIRED OUTPUT, not the input image.
:param edit_instruction: What the changed area should look like.
:param prompt: What the changed area should look like.
Tool auto-prepends quality tags — don't include those.
:param style: One of the StyleName values. Omit to auto-detect.
:param mask_text: Noun phrase describing the region to edit. Set
@@ -700,7 +810,7 @@ class Tools:
:param seed: 0 to randomize, otherwise specific.
:return: Markdown image of the result, or an error if no image is attached.
"""
chosen = style or _route_style(edit_instruction)
chosen = style or _route_style(prompt)
settings = STYLES.get(chosen)
if not settings:
return f"Unknown style '{chosen}'. Available: {', '.join(STYLES.keys())}"
@@ -722,12 +832,24 @@ class Tools:
async with aiohttp.ClientSession() as session:
await emit("Looking for attached image…")
raw_in = await _extract_attached_image(__files__, __messages__, session)
raw_in = await _extract_attached_image(
__files__, __messages__, __metadata__, session,
)
if raw_in is None:
msgs_with_files = sum(
1 for m in (__messages__ or [])
if isinstance(m, dict) and m.get("files")
)
chat_id_present = bool((__metadata__ or {}).get("chat_id"))
return (
"No image found in the chat. Ask the user to attach the "
"image they want edited (paperclip / drag-drop), or call "
"generate_image instead if they want a new image."
"No image found in the chat. Diagnostics: "
f"__files__={len(__files__ or [])}, "
f"__messages__={len(__messages__ or [])} "
f"(of which {msgs_with_files} had a files field), "
f"chat_id_present={chat_id_present}, "
f"openwebui_runtime={_OPENWEBUI_RUNTIME}. "
"Ask the user to attach the image they want edited "
"(paperclip / drag-drop), or call generate_image instead."
)
await emit("Uploading source to ComfyUI…")
@@ -741,7 +863,7 @@ class Tools:
+ (f", mask='{mask_text}'" if mask_text else "")
)
positive = f"{settings['prefix']}{edit_instruction}"
positive = f"{settings['prefix']}{prompt}"
negative = settings["negative"]
if negative_prompt:
negative = f"{negative}, {negative_prompt}"

View File

@@ -18,4 +18,11 @@ if [ -d /opt/comfyui/custom_nodes ]; then
done
fi
# Force-pin known-incompatible packages back into a working range. Some
# custom nodes bring transformers >=5 transitively, which removes
# BertModel.get_head_mask and breaks comfyui_segment_anything's
# GroundingDINO. Run last so it wins over anything the loop above
# installed.
pip install -q "transformers>=4.40,<5" || echo "[entrypoint] transformers pin failed — continuing"
exec "$@"