21 Commits

Author SHA1 Message Date
c07e962cae Image tools: migrate to OWUI 0.9.0 async model accessors
Open WebUI 0.9.0 made every model-class accessor (Users.get_user_by_id,
Chats.get_chat_by_id, Files.get_file_by_id, …) a coroutine. Both tools
were still calling them synchronously, so the calls returned coroutines
instead of model objects; the first downstream attribute access threw,
the bare `except Exception: return False` swallowed it, and uploads
silently fell through to the data-URI fallback. The data-URI markdown
rendered during streaming but didn't survive post-stream commit, which
looked like "image flashes in, then disappears."

Add await to the six call sites; promote `_read_file_dict` to async
since it now contains an await; restore `_push_image_to_chat` to the
canonical `files` event so the file-attachment chrome (thumbnail +
download) comes back.

This supersedes commit d034700, which mis-diagnosed the symptom as a
virtualization regression and switched to a `message`-event markdown
workaround. The workaround didn't help (same flash-and-vanish) because
the upload pre-check still failed for the same async-migration reason
and the data-URI fallback path still ran.

smart_image_gen.py 0.7.9 -> 0.7.10
smart_image_pipe.py 0.1.1 -> 0.1.2

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 06:16:02 -05:00
d034700af9 Image tools: work around OWUI 0.9.x files-event regression
Open WebUI 0.9.0 introduced chat-history virtualization that
unmounts off-screen assistant messages and reconstructs them from
persisted shape; `files` attached mid-stream by a tool don't
survive the round-trip — the image flashes in during streaming
and disappears the moment the message commits.

Both image tools now upload via Open WebUI's file store as before
but surface the result as a markdown image injected into the
assistant message via a `message` event, which is part of the
persisted shape and renders reliably across remounts. Trade-off:
loses the file-attachment chrome (thumbnail + download button).
Each tool has a TODO marking the swap site with the original
`files` payload inlined for one-line revert once upstream fixes
the regression.

smart_image_gen.py 0.7.8 -> 0.7.9
smart_image_pipe.py 0.1.0 -> 0.1.1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 06:05:12 -05:00
28f370a80b Update deployments/ai-stack/openwebui-tools/smart_image_gen.py 2026-04-20 17:08:32 +00:00
a1af88a632 smart_image_gen v0.7.8: per-job filename_prefix + source-image diagnostic
User reported observing a wrong image returned. Two hardenings:

1. _job_prefix() generates a per-submission filename_prefix
   ('smartedit_<10hex>', 'smartinpaint_<10hex>', 'smartgen_<10hex>')
   so SaveImage outputs from concurrent jobs sit in their own
   namespace and ComfyUI's auto-incrementing counter can never
   produce filenames that overlap across jobs. With a shared prefix,
   if a queued job's history-fetch ever raced past its own SaveImage
   record there was a theoretical (if unlikely) path to picking up
   another job's _00001_.png. Per-job prefix kills that vector.

2. edit_image now emits the source image's SHA-1 and byte count in a
   status event before uploading to ComfyUI. If a future 'wrong
   image' report comes in, that hash should match the prior
   generation's output — if it doesn't, we know
   _extract_attached_image picked up the wrong source rather than
   ComfyUI returning the wrong file. Hashlib import is local so the
   module's import surface stays clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 18:22:45 -05:00
ec6108888a smart_image_gen v0.7.7: enforce style inheritance for edit_image
Vision-capable LLMs misclassify rendered subjects when picking a
style — observed: model called juggernaut for an edit on a furry-il
generation because the rendered character looked 'photoreal-ish' to
its vision encoder. Each visual judgment is independent so styles
flip mid-chat.

Flipped resolution order in edit_image so inheritance from the prior
generate_image / edit_image call DOMINATES the LLM's explicit style
arg. The LLM's choice only wins when there's nothing to inherit
(first edit in a chat, fresh user upload). Workaround for legitimate
style changes is starting a new chat.

System prompt updated to match: tells the LLM that style inheritance
is enforced, that passing style on follow-up calls is ignored, and
that user requests for style change require a new chat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:53:32 -05:00
1ae451ad5f Add smart_image_pipe.py — deterministic Pipe for image gen / edit / inpaint
Registers as 'Image Studio (Pipe)' in Open WebUI's chat-model dropdown.
No LLM in the loop — the Pipe parses the user's message with regex,
finds attached images via the same multi-source extractor as the
Tool, routes deterministically to txt2img / img2img / inpaint, calls
ComfyUI, returns the image. Bypasses the abliterated Qwen 3.5 ×
Open WebUI tool-call format interop bug entirely.

Style resolution: explicit FORCE_STYLE valve > inherited from prior
assistant message ('style: X' marker) > keyword regex on user text >
default (furry-il).

Inpaint trigger: regex picks up phrasings like 'change the X',
'remove the Y', 'make the Z bigger/red/etc' and pulls the noun
phrase as mask_text. No match → full img2img.

Reuses the same per-style settings, prefix dialects, negatives,
GrowMask feathering, file extraction (with chat-DB fallback) and
files-event push as smart_image_gen.py — code is duplicated rather
than shared because Open WebUI loads each plugin file standalone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:58:01 -05:00
63917709c1 smart_image_gen v0.7.6: feather inpaint mask edges via GrowMask
The raw SAM mask is a hard binary edge — KSampler repaints right up
to it, and SDXL has no surrounding-pixel context inside the mask to
blend with. Result: the inpainted region looks pasted-on with visible
seams (the artifact the user reported on the werewolf-groin edit).

Inserted a stock GrowMask node (id 17) between
GroundingDinoSAMSegment and SetLatentNoiseMask:
  - expand=12 grows the mask outward by 12 px so the new content
    overlaps a strip of original pixels for blending
  - tapered_corners=True softens the edge so the noise transition
    isn't a step function

GrowMask is built into stock ComfyUI; no extra custom node install.
KSampler still uses the caller-supplied denoise (default 1.0 in
inpaint mode).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:40:39 -05:00
18a205d69d smart_image_gen v0.7.5: fix black-image bug — fetch from SaveImage explicitly
_submit_and_fetch was iterating history[prompt_id]['outputs'].values()
and grabbing the first image it saw. With the inpaint workflow that
includes nodes other than SaveImage that emit IMAGE outputs (the
GroundingDinoSAMSegment node returns an overlay/mask-applied image
in addition to the mask), and dict iteration order is undefined —
sometimes we'd return the overlay (which can render mostly black)
instead of the actual SaveImage result.

Fix: prefer outputs from the SaveImage node id ('9' in every workflow
the tool builds) explicitly. Fall back to scanning all outputs only
if SaveImage didn't appear (workflow drift, manual edit, etc).

User reported seeing the correct inpaint in ComfyUI's native UI but
black in chat — this is the gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:30:04 -05:00
0fa8040251 Eliminate first-inpaint timeout: preseed SAM/GroundingDINO + 600s default
Two changes to address the timeout-and-retry loop the user hit on the
first edit_image call:

  1. comfyui-init-models.sh now fetches the three weights inpaint
     needs into /models/sams and /models/grounding-dino:
       - sam_hq_vit_h.pth                (~2.5 GB)
       - groundingdino_swint_ogc.pth     (~700 MB)
       - GroundingDINO_SwinT_OGC.cfg.py  (~1 KB)
     Without preseeding these auto-download on first inpaint, which
     takes minutes and times out the tool call. The mkdir line gets
     the new subdirs added too.

  2. Tool TIMEOUT_SECONDS valve default bumped 240s → 600s as
     defense-in-depth — even with weights preseeded, BERT-base
     auto-downloads via transformers on first GroundingDINO load
     (~30s) and a slow KSampler on a contended GPU can push past
     4 minutes occasionally. Steady-state runs still finish in under
     a minute; the valve only matters for first-call latency.

After comfyui-model-init re-runs (`docker compose up -d
comfyui-model-init`), first inpaint should be near-instant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:39:23 -05:00
6700f6ce33 smart_image_gen v0.7.3: edit_image inherits style from prior tool call
User reported edit_image picking 'juggernaut' (photoreal) for an edit
on a furry image — the LLM didn't carry context, and the tool's
fallback _route_style only sees the edit instruction text, which for
neutral edits ('bigger', 'glowing eyes') has no furry keywords.

Fix in two places:

  1. Tool: _inherited_style scans __messages__ in reverse for prior
     generate_image / edit_image tool calls and returns the style arg
     they used. edit_image now resolves: explicit style → inherited →
     keyword fallback. Deterministic, no LLM cooperation needed for
     follow-up edits on previously-generated images.

  2. System prompt: explicit three-step style resolution for
     edit_image. Generated by you → omit style and auto-inherit.
     Uploaded by user → INSPECT visually and pick a matching style
     (the LLM is the only thing with vision; the tool can't see
     pixels). Then keep that style for subsequent edits.

Both paths matter — the tool fix handles the common case
deterministically, the prompt fix handles the upload case where
there's nothing to inherit from.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:31:57 -05:00
f26dfbee02 smart_image_gen v0.7.2: chat-DB fallback + diagnostic 'no image' msg
If __messages__ doesn't include the assistant's prior file attachments
(which is what the screenshot is showing), the new fallback queries
the chat by id via Chats.get_chat_by_id and walks every persisted
message for files. Open WebUI's socket handler always upserts files
onto the assistant message via {'files': files} so this path is
authoritative.

The 'No image found' return now includes diagnostic counts —
__files__, __messages__, messages_with_files, chat_id_present,
openwebui_runtime — so subsequent failures actually show what the
tool saw instead of being opaque.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:07:55 -05:00
06433d3815 smart_image_gen v0.7.1: rename edit_image arg + parse file id from URL
Two bugs in one screenshot:

1. LLM called edit_image(prompt=..., ...) but the signature was
   edit_image(edit_instruction=..., ...) — mismatch, missing-arg
   crash. Renamed the first param to `prompt` so both tools have a
   matching, predictable name. System prompt updated with an explicit
   'do not invent edit_instruction' line for stubborn models.

2. After fix #1, edit_image still couldn't find the prior generated
   image because Open WebUI assistant-message file attachments only
   carry {type, url} (no id, no path). _read_file_dict now also
   greps the file id out of /api/v1/files/<uuid>/content URLs and
   feeds it to Files.get_file_by_id. Verified pattern matches
   absolute URLs (https://llm-1.srvno.de/api/v1/files/.../content).

System prompt also now says 'including images you previously
generated in this chat' to nudge the LLM to pick up assistant
outputs as edit candidates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:58:40 -05:00
f6f5690fcd smart_image_gen v0.7: edit_image finds previously-emitted images
Bug: after generate_image surfaced an image via the files event, the
next edit_image call returned 'No image found in the chat'. The image
was attached to the assistant's message, but _extract_attached_image
only scanned the user's __files__ param and image_url content blocks
on user messages — it never looked at messages.files for any role.

Fix: rewrite extraction to scan messages[].files in reverse for ALL
roles, so an assistant-emitted image from a prior tool call is found
the same way as a user-attached upload. Use Open WebUI's internal
Files.get_file_by_id when the file dict has an id, so we get raw
bytes from disk without going through the auth-protected
/api/v1/files/{id}/content endpoint. Old path-key and URL-fetch
paths kept as fallbacks.

Refactored shared helpers _file_dict_is_image and _read_file_dict
out of the loop to keep the search logic readable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:46:10 -05:00
d935e24624 Add text-targeted inpainting via GroundingDINO+SAM (mask_text param)
All checks were successful
release / Build & Push Docker Image (push) Successful in 44s
Five pieces:

1. Dockerfile installs storyicon/comfyui_segment_anything (GroundingDINO
   + SAM-HQ in one bundle) into custom_nodes and pip-installs its
   requirements at build time. Model weights auto-download to the
   comfyui-models volume on first inpaint (~3 GB one-time cost).

2. install-custom-node-deps.sh — entrypoint wrapper that pip-installs
   requirements.txt for any custom_node present at startup. Lets users
   add custom nodes via ComfyUI-Manager (or by git-cloning into the
   volume) and have the deps picked up on the next restart, without
   editing the Dockerfile.

3. smart_image_gen v0.6: edit_image gains a `mask_text` param. When
   set, builds an inpainting workflow (LoadImage → GroundingDinoSAM
   Segment → SetLatentNoiseMask → KSampler) so only the named region
   is repainted. When unset, falls through to the existing img2img
   path. Denoise default switches: 1.0 with mask_text (full repaint
   within mask), 0.7 without.

4. Image Studio system prompt teaches the LLM the LOCAL vs GLOBAL
   distinction — set mask_text whenever the user names a specific
   object/region ('the ball', 'the dog', 'the sky'); leave it unset
   only for whole-image style/lighting transformations.

5. Deployment README documents the new mode + the first-inpaint
   weight-download caveat.

Image rebuild required — bump tag to pick up the Dockerfile change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:43:52 -05:00
b604e3f509 smart_image_gen v0.5: surface images via files event (canonical path)
The data-URI message-event approach didn't render — Open WebUI's chat
frontend ignores data URIs from tool-emitted message events because
the markdown-base64 rewriter (utils/files.py convert_markdown_base64
_images) only runs on assistant streaming content, not on tool emits.

Switched to the path Open WebUI's own image-generation flow uses
(backend/open_webui/utils/middleware.py ~1325):

  1. Upload image bytes via open_webui.routers.files.upload_file_handler
     (gets back a file_item with id)
  2. Resolve the served URL via request.app.url_path_for(
     "get_file_content_by_id", id=file_item.id) → /api/v1/files/{id}/content
  3. Emit a `files` event:
        {"type": "files", "data": {"files": [{"type": "image", "url": ...}]}}

Tools now take __request__, __user__, __metadata__ params for the
upload (Open WebUI auto-injects these). Falls back to data-URI
message event if the runtime imports aren't available (e.g. running
the file standalone for tests). The internal upload bypasses
get_verified_user via the user= kwarg, so no token plumbing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 13:21:48 -05:00
4d996e1205 smart_image_gen v0.4: emit image to chat, return only confirmation
The data URI returned from the tool was being given to the LLM as the
tool result — the LLM then either echoed the base64 to the user as
plain text (screenshot 1) or hallucinated a description of what it
thought the image looked like (screenshot 2 — "an image of a cat
sitting on a windowsill" for a fox-warrior prompt).

Fix: push the markdown image into the chat directly via
__event_emitter__ as a "message" event, and return a short text
confirmation as the function value. The confirmation is worded to
prevent the LLM from describing the image or repeating the markdown
(both common failure modes for tool-using LLMs).

Both generate_image and edit_image fixed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 13:14:59 -05:00
d4e2058859 smart_image_gen v0.3: add edit_image (img2img) method
The Tool now exposes two methods the LLM picks between based on whether
the user attached an image:

  generate_image — txt2img (existing, unchanged behavior)
  edit_image     — img2img on the most recently attached image

edit_image extracts the source image from __messages__ (base64 data
URIs in image_url content blocks) or __files__ (local path or URL),
uploads to ComfyUI's /upload/image, runs an img2img workflow at the
caller-specified denoise (default 0.7), and returns the edited result.
Same per-style routing / sampler / CFG / prefix logic as generation.

Refactored the submit-and-poll loop into _submit_and_fetch shared by
both methods. Image extraction is defensive — tries messages first,
then files (path then URL), returns a clear "no image attached"
message rather than silently generating from scratch.

Image Studio system prompt rewritten to teach the LLM when to call
edit_image vs generate_image and how to pick denoise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:59:13 -05:00
9e22de0328 smart_image_gen: tighten docstring + Literal style enum
Two changes to make the LLM more likely to call the tool:

1. Lead the docstring with an unambiguous directive — "Create an image
   and show it to the user. Use this whenever the user asks you to
   draw, generate, ..." plus a hard "do not say you cannot generate
   images" line. Open WebUI feeds the docstring straight to the LLM as
   the tool description; first line carries the most weight.

2. `style: Optional[StyleName]` where StyleName is a Literal enum of
   the seven values. Native function-calling models read the type
   annotation and present the seven valid values to the LLM as a
   strict choice instead of a free-text param.

If the LLM still doesn't fire the tool, the install is probably wrong:
Workspace → Models → the model → Advanced Params → Function Calling
must be set to Native (not Default).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:52:26 -05:00
45d5541be0 smart_image_gen v0.2: per-style sampler/CFG/steps/CLIP-skip + prompt prefixes
Researched each of the seven SDXL checkpoints on Civitai and encoded the
creator-recommended generation defaults per style instead of one global
set. Material differences:

  - photo (CyberRealistic): dpmpp_2m_sde / karras / CFG 4 / 28 steps / CLIP 1
  - juggernaut: dpmpp_2m_sde / karras / CFG 4.5 / 35 steps / CLIP 1
  - pony: euler_a / normal / CFG 7.5 / 25 steps / CLIP 2
  - general (Talmendo): dpmpp_2m / karras / CFG 8 / 30 steps / CLIP 2
  - furry-nai (Reed): euler_a / normal / CFG 5 / 30 steps / CLIP 2
  - furry-noob (IndigoVoid): euler_a-only / normal / CFG 4.5 / 20 / CLIP 2
  - furry-il (NovaFurry): euler_a / normal / CFG 4 / 30 steps / CLIP 2

Three prompt-prefix dialects auto-prepended (NEVER cross-contaminated):
photoreal models get nothing, Pony gets the full
score_9..score_4_up chain (mandatory), and the NoobAI/Illustrious
furry models get their booru quality + year-tag prefixes
(masterpiece/best quality/absurdres/newest/etc). Workflow now includes
a CLIPSetLastLayer node so per-style CLIP skip works.

Routing default for generic "furry" flipped from Reed (NAI) to NovaFurry
(Illustrious) — current sweet-spot consensus. Removed global
DEFAULT_STEPS/DEFAULT_CFG valves; per-style values are canonical.

Sources: each model's Civitai page (CyberRealisticXL, Juggernaut,
Pony V6 XL, TalmendoXL, Reed FurryMix, IndigoVoid FurryFused,
NovaFurryXL) and Pony/Illustrious prompting guides.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:45:34 -05:00
cd0034cd99 Flesh out per-style negatives in smart_image_gen Tool
Each style now gets a proper baseline covering quality, anatomy, and
watermark/signature suppression — plus the appropriate style-leak guards
(no-cartoon for photo, no-human for furry, score_4–6 suppression for
pony). Quality terms only; no NSFW filtering by default since several
checkpoints in this set are commonly used for adult work and would
fight a baked-in content filter. If SFW-by-default is wanted, add an
explicit safe-mode flag rather than expanding NEGATIVES.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:39:24 -05:00
392b26167f Add smart_image_gen Tool for per-prompt checkpoint routing
Open WebUI Tool the LLM invokes instead of the built-in image action.
Auto-routes among the seven SDXL checkpoints (photo / juggernaut /
pony / general / furry-{nai,noob,il}) based on either an explicit
`style` arg or first-match-wins regex over the prompt. Constructs the
ComfyUI workflow inline, submits via /prompt, polls /history, returns
the result as a base64 data-URI markdown image so no extra hosting is
needed. Per-style default negatives. ComfyUI URL / steps / CFG /
timeout are admin-tunable Valves.

Filters can't see image-gen requests in Open WebUI (the routers skip
the filter chain), so the LLM-driven Tool is the only path that
gives intent-aware routing without changing the chat UX.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:17:02 -05:00