Open WebUI 0.9.0 made every model-class accessor (Users.get_user_by_id,
Chats.get_chat_by_id, Files.get_file_by_id, …) a coroutine. Both tools
were still calling them synchronously, so the calls returned coroutines
instead of model objects; the first downstream attribute access threw,
the bare `except Exception: return False` swallowed it, and uploads
silently fell through to the data-URI fallback. The data-URI markdown
rendered during streaming but didn't survive post-stream commit, which
looked like "image flashes in, then disappears."
Add await to the six call sites; promote `_read_file_dict` to async
since it now contains an await; restore `_push_image_to_chat` to the
canonical `files` event so the file-attachment chrome (thumbnail +
download) comes back.
This supersedes commit d034700, which mis-diagnosed the symptom as a
virtualization regression and switched to a `message`-event markdown
workaround. The workaround didn't help (same flash-and-vanish) because
the upload pre-check still failed for the same async-migration reason
and the data-URI fallback path still ran.
smart_image_gen.py 0.7.9 -> 0.7.10
smart_image_pipe.py 0.1.1 -> 0.1.2
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Open WebUI 0.9.0 introduced chat-history virtualization that
unmounts off-screen assistant messages and reconstructs them from
persisted shape; `files` attached mid-stream by a tool don't
survive the round-trip — the image flashes in during streaming
and disappears the moment the message commits.
Both image tools now upload via Open WebUI's file store as before
but surface the result as a markdown image injected into the
assistant message via a `message` event, which is part of the
persisted shape and renders reliably across remounts. Trade-off:
loses the file-attachment chrome (thumbnail + download button).
Each tool has a TODO marking the swap site with the original
`files` payload inlined for one-line revert once upstream fixes
the regression.
smart_image_gen.py 0.7.8 -> 0.7.9
smart_image_pipe.py 0.1.0 -> 0.1.1
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User reported observing a wrong image returned. Two hardenings:
1. _job_prefix() generates a per-submission filename_prefix
('smartedit_<10hex>', 'smartinpaint_<10hex>', 'smartgen_<10hex>')
so SaveImage outputs from concurrent jobs sit in their own
namespace and ComfyUI's auto-incrementing counter can never
produce filenames that overlap across jobs. With a shared prefix,
if a queued job's history-fetch ever raced past its own SaveImage
record there was a theoretical (if unlikely) path to picking up
another job's _00001_.png. Per-job prefix kills that vector.
2. edit_image now emits the source image's SHA-1 and byte count in a
status event before uploading to ComfyUI. If a future 'wrong
image' report comes in, that hash should match the prior
generation's output — if it doesn't, we know
_extract_attached_image picked up the wrong source rather than
ComfyUI returning the wrong file. Hashlib import is local so the
module's import surface stays clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vision-capable LLMs misclassify rendered subjects when picking a
style — observed: model called juggernaut for an edit on a furry-il
generation because the rendered character looked 'photoreal-ish' to
its vision encoder. Each visual judgment is independent so styles
flip mid-chat.
Flipped resolution order in edit_image so inheritance from the prior
generate_image / edit_image call DOMINATES the LLM's explicit style
arg. The LLM's choice only wins when there's nothing to inherit
(first edit in a chat, fresh user upload). Workaround for legitimate
style changes is starting a new chat.
System prompt updated to match: tells the LLM that style inheritance
is enforced, that passing style on follow-up calls is ignored, and
that user requests for style change require a new chat.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Registers as 'Image Studio (Pipe)' in Open WebUI's chat-model dropdown.
No LLM in the loop — the Pipe parses the user's message with regex,
finds attached images via the same multi-source extractor as the
Tool, routes deterministically to txt2img / img2img / inpaint, calls
ComfyUI, returns the image. Bypasses the abliterated Qwen 3.5 ×
Open WebUI tool-call format interop bug entirely.
Style resolution: explicit FORCE_STYLE valve > inherited from prior
assistant message ('style: X' marker) > keyword regex on user text >
default (furry-il).
Inpaint trigger: regex picks up phrasings like 'change the X',
'remove the Y', 'make the Z bigger/red/etc' and pulls the noun
phrase as mask_text. No match → full img2img.
Reuses the same per-style settings, prefix dialects, negatives,
GrowMask feathering, file extraction (with chat-DB fallback) and
files-event push as smart_image_gen.py — code is duplicated rather
than shared because Open WebUI loads each plugin file standalone.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The raw SAM mask is a hard binary edge — KSampler repaints right up
to it, and SDXL has no surrounding-pixel context inside the mask to
blend with. Result: the inpainted region looks pasted-on with visible
seams (the artifact the user reported on the werewolf-groin edit).
Inserted a stock GrowMask node (id 17) between
GroundingDinoSAMSegment and SetLatentNoiseMask:
- expand=12 grows the mask outward by 12 px so the new content
overlaps a strip of original pixels for blending
- tapered_corners=True softens the edge so the noise transition
isn't a step function
GrowMask is built into stock ComfyUI; no extra custom node install.
KSampler still uses the caller-supplied denoise (default 1.0 in
inpaint mode).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_submit_and_fetch was iterating history[prompt_id]['outputs'].values()
and grabbing the first image it saw. With the inpaint workflow that
includes nodes other than SaveImage that emit IMAGE outputs (the
GroundingDinoSAMSegment node returns an overlay/mask-applied image
in addition to the mask), and dict iteration order is undefined —
sometimes we'd return the overlay (which can render mostly black)
instead of the actual SaveImage result.
Fix: prefer outputs from the SaveImage node id ('9' in every workflow
the tool builds) explicitly. Fall back to scanning all outputs only
if SaveImage didn't appear (workflow drift, manual edit, etc).
User reported seeing the correct inpaint in ComfyUI's native UI but
black in chat — this is the gap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes to address the timeout-and-retry loop the user hit on the
first edit_image call:
1. comfyui-init-models.sh now fetches the three weights inpaint
needs into /models/sams and /models/grounding-dino:
- sam_hq_vit_h.pth (~2.5 GB)
- groundingdino_swint_ogc.pth (~700 MB)
- GroundingDINO_SwinT_OGC.cfg.py (~1 KB)
Without preseeding these auto-download on first inpaint, which
takes minutes and times out the tool call. The mkdir line gets
the new subdirs added too.
2. Tool TIMEOUT_SECONDS valve default bumped 240s → 600s as
defense-in-depth — even with weights preseeded, BERT-base
auto-downloads via transformers on first GroundingDINO load
(~30s) and a slow KSampler on a contended GPU can push past
4 minutes occasionally. Steady-state runs still finish in under
a minute; the valve only matters for first-call latency.
After comfyui-model-init re-runs (`docker compose up -d
comfyui-model-init`), first inpaint should be near-instant.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User reported edit_image picking 'juggernaut' (photoreal) for an edit
on a furry image — the LLM didn't carry context, and the tool's
fallback _route_style only sees the edit instruction text, which for
neutral edits ('bigger', 'glowing eyes') has no furry keywords.
Fix in two places:
1. Tool: _inherited_style scans __messages__ in reverse for prior
generate_image / edit_image tool calls and returns the style arg
they used. edit_image now resolves: explicit style → inherited →
keyword fallback. Deterministic, no LLM cooperation needed for
follow-up edits on previously-generated images.
2. System prompt: explicit three-step style resolution for
edit_image. Generated by you → omit style and auto-inherit.
Uploaded by user → INSPECT visually and pick a matching style
(the LLM is the only thing with vision; the tool can't see
pixels). Then keep that style for subsequent edits.
Both paths matter — the tool fix handles the common case
deterministically, the prompt fix handles the upload case where
there's nothing to inherit from.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
If __messages__ doesn't include the assistant's prior file attachments
(which is what the screenshot is showing), the new fallback queries
the chat by id via Chats.get_chat_by_id and walks every persisted
message for files. Open WebUI's socket handler always upserts files
onto the assistant message via {'files': files} so this path is
authoritative.
The 'No image found' return now includes diagnostic counts —
__files__, __messages__, messages_with_files, chat_id_present,
openwebui_runtime — so subsequent failures actually show what the
tool saw instead of being opaque.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs in one screenshot:
1. LLM called edit_image(prompt=..., ...) but the signature was
edit_image(edit_instruction=..., ...) — mismatch, missing-arg
crash. Renamed the first param to `prompt` so both tools have a
matching, predictable name. System prompt updated with an explicit
'do not invent edit_instruction' line for stubborn models.
2. After fix#1, edit_image still couldn't find the prior generated
image because Open WebUI assistant-message file attachments only
carry {type, url} (no id, no path). _read_file_dict now also
greps the file id out of /api/v1/files/<uuid>/content URLs and
feeds it to Files.get_file_by_id. Verified pattern matches
absolute URLs (https://llm-1.srvno.de/api/v1/files/.../content).
System prompt also now says 'including images you previously
generated in this chat' to nudge the LLM to pick up assistant
outputs as edit candidates.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug: after generate_image surfaced an image via the files event, the
next edit_image call returned 'No image found in the chat'. The image
was attached to the assistant's message, but _extract_attached_image
only scanned the user's __files__ param and image_url content blocks
on user messages — it never looked at messages.files for any role.
Fix: rewrite extraction to scan messages[].files in reverse for ALL
roles, so an assistant-emitted image from a prior tool call is found
the same way as a user-attached upload. Use Open WebUI's internal
Files.get_file_by_id when the file dict has an id, so we get raw
bytes from disk without going through the auth-protected
/api/v1/files/{id}/content endpoint. Old path-key and URL-fetch
paths kept as fallbacks.
Refactored shared helpers _file_dict_is_image and _read_file_dict
out of the loop to keep the search logic readable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five pieces:
1. Dockerfile installs storyicon/comfyui_segment_anything (GroundingDINO
+ SAM-HQ in one bundle) into custom_nodes and pip-installs its
requirements at build time. Model weights auto-download to the
comfyui-models volume on first inpaint (~3 GB one-time cost).
2. install-custom-node-deps.sh — entrypoint wrapper that pip-installs
requirements.txt for any custom_node present at startup. Lets users
add custom nodes via ComfyUI-Manager (or by git-cloning into the
volume) and have the deps picked up on the next restart, without
editing the Dockerfile.
3. smart_image_gen v0.6: edit_image gains a `mask_text` param. When
set, builds an inpainting workflow (LoadImage → GroundingDinoSAM
Segment → SetLatentNoiseMask → KSampler) so only the named region
is repainted. When unset, falls through to the existing img2img
path. Denoise default switches: 1.0 with mask_text (full repaint
within mask), 0.7 without.
4. Image Studio system prompt teaches the LLM the LOCAL vs GLOBAL
distinction — set mask_text whenever the user names a specific
object/region ('the ball', 'the dog', 'the sky'); leave it unset
only for whole-image style/lighting transformations.
5. Deployment README documents the new mode + the first-inpaint
weight-download caveat.
Image rebuild required — bump tag to pick up the Dockerfile change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The data-URI message-event approach didn't render — Open WebUI's chat
frontend ignores data URIs from tool-emitted message events because
the markdown-base64 rewriter (utils/files.py convert_markdown_base64
_images) only runs on assistant streaming content, not on tool emits.
Switched to the path Open WebUI's own image-generation flow uses
(backend/open_webui/utils/middleware.py ~1325):
1. Upload image bytes via open_webui.routers.files.upload_file_handler
(gets back a file_item with id)
2. Resolve the served URL via request.app.url_path_for(
"get_file_content_by_id", id=file_item.id) → /api/v1/files/{id}/content
3. Emit a `files` event:
{"type": "files", "data": {"files": [{"type": "image", "url": ...}]}}
Tools now take __request__, __user__, __metadata__ params for the
upload (Open WebUI auto-injects these). Falls back to data-URI
message event if the runtime imports aren't available (e.g. running
the file standalone for tests). The internal upload bypasses
get_verified_user via the user= kwarg, so no token plumbing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The data URI returned from the tool was being given to the LLM as the
tool result — the LLM then either echoed the base64 to the user as
plain text (screenshot 1) or hallucinated a description of what it
thought the image looked like (screenshot 2 — "an image of a cat
sitting on a windowsill" for a fox-warrior prompt).
Fix: push the markdown image into the chat directly via
__event_emitter__ as a "message" event, and return a short text
confirmation as the function value. The confirmation is worded to
prevent the LLM from describing the image or repeating the markdown
(both common failure modes for tool-using LLMs).
Both generate_image and edit_image fixed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Tool now exposes two methods the LLM picks between based on whether
the user attached an image:
generate_image — txt2img (existing, unchanged behavior)
edit_image — img2img on the most recently attached image
edit_image extracts the source image from __messages__ (base64 data
URIs in image_url content blocks) or __files__ (local path or URL),
uploads to ComfyUI's /upload/image, runs an img2img workflow at the
caller-specified denoise (default 0.7), and returns the edited result.
Same per-style routing / sampler / CFG / prefix logic as generation.
Refactored the submit-and-poll loop into _submit_and_fetch shared by
both methods. Image extraction is defensive — tries messages first,
then files (path then URL), returns a clear "no image attached"
message rather than silently generating from scratch.
Image Studio system prompt rewritten to teach the LLM when to call
edit_image vs generate_image and how to pick denoise.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes to make the LLM more likely to call the tool:
1. Lead the docstring with an unambiguous directive — "Create an image
and show it to the user. Use this whenever the user asks you to
draw, generate, ..." plus a hard "do not say you cannot generate
images" line. Open WebUI feeds the docstring straight to the LLM as
the tool description; first line carries the most weight.
2. `style: Optional[StyleName]` where StyleName is a Literal enum of
the seven values. Native function-calling models read the type
annotation and present the seven valid values to the LLM as a
strict choice instead of a free-text param.
If the LLM still doesn't fire the tool, the install is probably wrong:
Workspace → Models → the model → Advanced Params → Function Calling
must be set to Native (not Default).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each style now gets a proper baseline covering quality, anatomy, and
watermark/signature suppression — plus the appropriate style-leak guards
(no-cartoon for photo, no-human for furry, score_4–6 suppression for
pony). Quality terms only; no NSFW filtering by default since several
checkpoints in this set are commonly used for adult work and would
fight a baked-in content filter. If SFW-by-default is wanted, add an
explicit safe-mode flag rather than expanding NEGATIVES.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Open WebUI Tool the LLM invokes instead of the built-in image action.
Auto-routes among the seven SDXL checkpoints (photo / juggernaut /
pony / general / furry-{nai,noob,il}) based on either an explicit
`style` arg or first-match-wins regex over the prompt. Constructs the
ComfyUI workflow inline, submits via /prompt, polls /history, returns
the result as a base64 data-URI markdown image so no extra hosting is
needed. Per-style default negatives. ComfyUI URL / steps / CFG /
timeout are admin-tunable Valves.
Filters can't see image-gen requests in Open WebUI (the routers skip
the filter chain), so the LLM-driven Tool is the only path that
gives intent-aware routing without changing the chat UX.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>