comfyui-nvidia

Author	SHA1	Message	Date
William Gill	c07e962cae	Image tools: migrate to OWUI 0.9.0 async model accessors Open WebUI 0.9.0 made every model-class accessor (Users.get_user_by_id, Chats.get_chat_by_id, Files.get_file_by_id, …) a coroutine. Both tools were still calling them synchronously, so the calls returned coroutines instead of model objects; the first downstream attribute access threw, the bare `except Exception: return False` swallowed it, and uploads silently fell through to the data-URI fallback. The data-URI markdown rendered during streaming but didn't survive post-stream commit, which looked like "image flashes in, then disappears." Add await to the six call sites; promote `_read_file_dict` to async since it now contains an await; restore `_push_image_to_chat` to the canonical `files` event so the file-attachment chrome (thumbnail + download) comes back. This supersedes commit `d034700`, which mis-diagnosed the symptom as a virtualization regression and switched to a `message`-event markdown workaround. The workaround didn't help (same flash-and-vanish) because the upload pre-check still failed for the same async-migration reason and the data-URI fallback path still ran. smart_image_gen.py 0.7.9 -> 0.7.10 smart_image_pipe.py 0.1.1 -> 0.1.2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 06:16:02 -05:00
William Gill	d034700af9	Image tools: work around OWUI 0.9.x files-event regression Open WebUI 0.9.0 introduced chat-history virtualization that unmounts off-screen assistant messages and reconstructs them from persisted shape; `files` attached mid-stream by a tool don't survive the round-trip — the image flashes in during streaming and disappears the moment the message commits. Both image tools now upload via Open WebUI's file store as before but surface the result as a markdown image injected into the assistant message via a `message` event, which is part of the persisted shape and renders reliably across remounts. Trade-off: loses the file-attachment chrome (thumbnail + download button). Each tool has a TODO marking the swap site with the original `files` payload inlined for one-line revert once upstream fixes the regression. smart_image_gen.py 0.7.8 -> 0.7.9 smart_image_pipe.py 0.1.0 -> 0.1.1 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 06:05:12 -05:00
57_Wolve	28f370a80b	Update deployments/ai-stack/openwebui-tools/smart_image_gen.py	2026-04-20 17:08:32 +00:00
William Gill	a1af88a632	smart_image_gen v0.7.8: per-job filename_prefix + source-image diagnostic User reported observing a wrong image returned. Two hardenings: 1. _job_prefix() generates a per-submission filename_prefix ('smartedit_<10hex>', 'smartinpaint_<10hex>', 'smartgen_<10hex>') so SaveImage outputs from concurrent jobs sit in their own namespace and ComfyUI's auto-incrementing counter can never produce filenames that overlap across jobs. With a shared prefix, if a queued job's history-fetch ever raced past its own SaveImage record there was a theoretical (if unlikely) path to picking up another job's _00001_.png. Per-job prefix kills that vector. 2. edit_image now emits the source image's SHA-1 and byte count in a status event before uploading to ComfyUI. If a future 'wrong image' report comes in, that hash should match the prior generation's output — if it doesn't, we know _extract_attached_image picked up the wrong source rather than ComfyUI returning the wrong file. Hashlib import is local so the module's import surface stays clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 18:22:45 -05:00
William Gill	ec6108888a	smart_image_gen v0.7.7: enforce style inheritance for edit_image Vision-capable LLMs misclassify rendered subjects when picking a style — observed: model called juggernaut for an edit on a furry-il generation because the rendered character looked 'photoreal-ish' to its vision encoder. Each visual judgment is independent so styles flip mid-chat. Flipped resolution order in edit_image so inheritance from the prior generate_image / edit_image call DOMINATES the LLM's explicit style arg. The LLM's choice only wins when there's nothing to inherit (first edit in a chat, fresh user upload). Workaround for legitimate style changes is starting a new chat. System prompt updated to match: tells the LLM that style inheritance is enforced, that passing style on follow-up calls is ignored, and that user requests for style change require a new chat. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 17:53:32 -05:00
William Gill	1ae451ad5f	Add smart_image_pipe.py — deterministic Pipe for image gen / edit / inpaint Registers as 'Image Studio (Pipe)' in Open WebUI's chat-model dropdown. No LLM in the loop — the Pipe parses the user's message with regex, finds attached images via the same multi-source extractor as the Tool, routes deterministically to txt2img / img2img / inpaint, calls ComfyUI, returns the image. Bypasses the abliterated Qwen 3.5 × Open WebUI tool-call format interop bug entirely. Style resolution: explicit FORCE_STYLE valve > inherited from prior assistant message ('style: X' marker) > keyword regex on user text > default (furry-il). Inpaint trigger: regex picks up phrasings like 'change the X', 'remove the Y', 'make the Z bigger/red/etc' and pulls the noun phrase as mask_text. No match → full img2img. Reuses the same per-style settings, prefix dialects, negatives, GrowMask feathering, file extraction (with chat-DB fallback) and files-event push as smart_image_gen.py — code is duplicated rather than shared because Open WebUI loads each plugin file standalone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:58:01 -05:00
William Gill	63917709c1	smart_image_gen v0.7.6: feather inpaint mask edges via GrowMask The raw SAM mask is a hard binary edge — KSampler repaints right up to it, and SDXL has no surrounding-pixel context inside the mask to blend with. Result: the inpainted region looks pasted-on with visible seams (the artifact the user reported on the werewolf-groin edit). Inserted a stock GrowMask node (id 17) between GroundingDinoSAMSegment and SetLatentNoiseMask: - expand=12 grows the mask outward by 12 px so the new content overlaps a strip of original pixels for blending - tapered_corners=True softens the edge so the noise transition isn't a step function GrowMask is built into stock ComfyUI; no extra custom node install. KSampler still uses the caller-supplied denoise (default 1.0 in inpaint mode). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:40:39 -05:00
William Gill	18a205d69d	smart_image_gen v0.7.5: fix black-image bug — fetch from SaveImage explicitly _submit_and_fetch was iterating history[prompt_id]['outputs'].values() and grabbing the first image it saw. With the inpaint workflow that includes nodes other than SaveImage that emit IMAGE outputs (the GroundingDinoSAMSegment node returns an overlay/mask-applied image in addition to the mask), and dict iteration order is undefined — sometimes we'd return the overlay (which can render mostly black) instead of the actual SaveImage result. Fix: prefer outputs from the SaveImage node id ('9' in every workflow the tool builds) explicitly. Fall back to scanning all outputs only if SaveImage didn't appear (workflow drift, manual edit, etc). User reported seeing the correct inpaint in ComfyUI's native UI but black in chat — this is the gap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:30:04 -05:00
William Gill	0fa8040251	Eliminate first-inpaint timeout: preseed SAM/GroundingDINO + 600s default Two changes to address the timeout-and-retry loop the user hit on the first edit_image call: 1. comfyui-init-models.sh now fetches the three weights inpaint needs into /models/sams and /models/grounding-dino: - sam_hq_vit_h.pth (~2.5 GB) - groundingdino_swint_ogc.pth (~700 MB) - GroundingDINO_SwinT_OGC.cfg.py (~1 KB) Without preseeding these auto-download on first inpaint, which takes minutes and times out the tool call. The mkdir line gets the new subdirs added too. 2. Tool TIMEOUT_SECONDS valve default bumped 240s → 600s as defense-in-depth — even with weights preseeded, BERT-base auto-downloads via transformers on first GroundingDINO load (~30s) and a slow KSampler on a contended GPU can push past 4 minutes occasionally. Steady-state runs still finish in under a minute; the valve only matters for first-call latency. After comfyui-model-init re-runs (`docker compose up -d comfyui-model-init`), first inpaint should be near-instant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 15:39:23 -05:00
William Gill	6700f6ce33	smart_image_gen v0.7.3: edit_image inherits style from prior tool call User reported edit_image picking 'juggernaut' (photoreal) for an edit on a furry image — the LLM didn't carry context, and the tool's fallback _route_style only sees the edit instruction text, which for neutral edits ('bigger', 'glowing eyes') has no furry keywords. Fix in two places: 1. Tool: _inherited_style scans __messages__ in reverse for prior generate_image / edit_image tool calls and returns the style arg they used. edit_image now resolves: explicit style → inherited → keyword fallback. Deterministic, no LLM cooperation needed for follow-up edits on previously-generated images. 2. System prompt: explicit three-step style resolution for edit_image. Generated by you → omit style and auto-inherit. Uploaded by user → INSPECT visually and pick a matching style (the LLM is the only thing with vision; the tool can't see pixels). Then keep that style for subsequent edits. Both paths matter — the tool fix handles the common case deterministically, the prompt fix handles the upload case where there's nothing to inherit from. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 15:31:57 -05:00
William Gill	f26dfbee02	smart_image_gen v0.7.2: chat-DB fallback + diagnostic 'no image' msg If __messages__ doesn't include the assistant's prior file attachments (which is what the screenshot is showing), the new fallback queries the chat by id via Chats.get_chat_by_id and walks every persisted message for files. Open WebUI's socket handler always upserts files onto the assistant message via {'files': files} so this path is authoritative. The 'No image found' return now includes diagnostic counts — __files__, __messages__, messages_with_files, chat_id_present, openwebui_runtime — so subsequent failures actually show what the tool saw instead of being opaque. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 15:07:55 -05:00
William Gill	06433d3815	smart_image_gen v0.7.1: rename edit_image arg + parse file id from URL Two bugs in one screenshot: 1. LLM called edit_image(prompt=..., ...) but the signature was edit_image(edit_instruction=..., ...) — mismatch, missing-arg crash. Renamed the first param to `prompt` so both tools have a matching, predictable name. System prompt updated with an explicit 'do not invent edit_instruction' line for stubborn models. 2. After fix #1, edit_image still couldn't find the prior generated image because Open WebUI assistant-message file attachments only carry {type, url} (no id, no path). _read_file_dict now also greps the file id out of /api/v1/files/<uuid>/content URLs and feeds it to Files.get_file_by_id. Verified pattern matches absolute URLs (https://llm-1.srvno.de/api/v1/files/.../content). System prompt also now says 'including images you previously generated in this chat' to nudge the LLM to pick up assistant outputs as edit candidates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:58:40 -05:00
William Gill	f6f5690fcd	smart_image_gen v0.7: edit_image finds previously-emitted images Bug: after generate_image surfaced an image via the files event, the next edit_image call returned 'No image found in the chat'. The image was attached to the assistant's message, but _extract_attached_image only scanned the user's __files__ param and image_url content blocks on user messages — it never looked at messages.files for any role. Fix: rewrite extraction to scan messages[].files in reverse for ALL roles, so an assistant-emitted image from a prior tool call is found the same way as a user-attached upload. Use Open WebUI's internal Files.get_file_by_id when the file dict has an id, so we get raw bytes from disk without going through the auth-protected /api/v1/files/{id}/content endpoint. Old path-key and URL-fetch paths kept as fallbacks. Refactored shared helpers _file_dict_is_image and _read_file_dict out of the loop to keep the search logic readable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:46:10 -05:00
William Gill	d935e24624	Add text-targeted inpainting via GroundingDINO+SAM (mask_text param) All checks were successful release / Build & Push Docker Image (push) Successful in 44s Details Five pieces: 1. Dockerfile installs storyicon/comfyui_segment_anything (GroundingDINO + SAM-HQ in one bundle) into custom_nodes and pip-installs its requirements at build time. Model weights auto-download to the comfyui-models volume on first inpaint (~3 GB one-time cost). 2. install-custom-node-deps.sh — entrypoint wrapper that pip-installs requirements.txt for any custom_node present at startup. Lets users add custom nodes via ComfyUI-Manager (or by git-cloning into the volume) and have the deps picked up on the next restart, without editing the Dockerfile. 3. smart_image_gen v0.6: edit_image gains a `mask_text` param. When set, builds an inpainting workflow (LoadImage → GroundingDinoSAM Segment → SetLatentNoiseMask → KSampler) so only the named region is repainted. When unset, falls through to the existing img2img path. Denoise default switches: 1.0 with mask_text (full repaint within mask), 0.7 without. 4. Image Studio system prompt teaches the LLM the LOCAL vs GLOBAL distinction — set mask_text whenever the user names a specific object/region ('the ball', 'the dog', 'the sky'); leave it unset only for whole-image style/lighting transformations. 5. Deployment README documents the new mode + the first-inpaint weight-download caveat. Image rebuild required — bump tag to pick up the Dockerfile change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:43:52 -05:00
William Gill	b604e3f509	smart_image_gen v0.5: surface images via files event (canonical path) The data-URI message-event approach didn't render — Open WebUI's chat frontend ignores data URIs from tool-emitted message events because the markdown-base64 rewriter (utils/files.py convert_markdown_base64 _images) only runs on assistant streaming content, not on tool emits. Switched to the path Open WebUI's own image-generation flow uses (backend/open_webui/utils/middleware.py ~1325): 1. Upload image bytes via open_webui.routers.files.upload_file_handler (gets back a file_item with id) 2. Resolve the served URL via request.app.url_path_for( "get_file_content_by_id", id=file_item.id) → /api/v1/files/{id}/content 3. Emit a `files` event: {"type": "files", "data": {"files": [{"type": "image", "url": ...}]}} Tools now take __request__, __user__, __metadata__ params for the upload (Open WebUI auto-injects these). Falls back to data-URI message event if the runtime imports aren't available (e.g. running the file standalone for tests). The internal upload bypasses get_verified_user via the user= kwarg, so no token plumbing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 13:21:48 -05:00
William Gill	4d996e1205	smart_image_gen v0.4: emit image to chat, return only confirmation The data URI returned from the tool was being given to the LLM as the tool result — the LLM then either echoed the base64 to the user as plain text (screenshot 1) or hallucinated a description of what it thought the image looked like (screenshot 2 — "an image of a cat sitting on a windowsill" for a fox-warrior prompt). Fix: push the markdown image into the chat directly via __event_emitter__ as a "message" event, and return a short text confirmation as the function value. The confirmation is worded to prevent the LLM from describing the image or repeating the markdown (both common failure modes for tool-using LLMs). Both generate_image and edit_image fixed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 13:14:59 -05:00
William Gill	d4e2058859	smart_image_gen v0.3: add edit_image (img2img) method The Tool now exposes two methods the LLM picks between based on whether the user attached an image: generate_image — txt2img (existing, unchanged behavior) edit_image — img2img on the most recently attached image edit_image extracts the source image from __messages__ (base64 data URIs in image_url content blocks) or __files__ (local path or URL), uploads to ComfyUI's /upload/image, runs an img2img workflow at the caller-specified denoise (default 0.7), and returns the edited result. Same per-style routing / sampler / CFG / prefix logic as generation. Refactored the submit-and-poll loop into _submit_and_fetch shared by both methods. Image extraction is defensive — tries messages first, then files (path then URL), returns a clear "no image attached" message rather than silently generating from scratch. Image Studio system prompt rewritten to teach the LLM when to call edit_image vs generate_image and how to pick denoise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:59:13 -05:00
William Gill	9e22de0328	smart_image_gen: tighten docstring + Literal style enum Two changes to make the LLM more likely to call the tool: 1. Lead the docstring with an unambiguous directive — "Create an image and show it to the user. Use this whenever the user asks you to draw, generate, ..." plus a hard "do not say you cannot generate images" line. Open WebUI feeds the docstring straight to the LLM as the tool description; first line carries the most weight. 2. `style: Optional[StyleName]` where StyleName is a Literal enum of the seven values. Native function-calling models read the type annotation and present the seven valid values to the LLM as a strict choice instead of a free-text param. If the LLM still doesn't fire the tool, the install is probably wrong: Workspace → Models → the model → Advanced Params → Function Calling must be set to Native (not Default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:52:26 -05:00
William Gill	45d5541be0	smart_image_gen v0.2: per-style sampler/CFG/steps/CLIP-skip + prompt prefixes Researched each of the seven SDXL checkpoints on Civitai and encoded the creator-recommended generation defaults per style instead of one global set. Material differences: - photo (CyberRealistic): dpmpp_2m_sde / karras / CFG 4 / 28 steps / CLIP 1 - juggernaut: dpmpp_2m_sde / karras / CFG 4.5 / 35 steps / CLIP 1 - pony: euler_a / normal / CFG 7.5 / 25 steps / CLIP 2 - general (Talmendo): dpmpp_2m / karras / CFG 8 / 30 steps / CLIP 2 - furry-nai (Reed): euler_a / normal / CFG 5 / 30 steps / CLIP 2 - furry-noob (IndigoVoid): euler_a-only / normal / CFG 4.5 / 20 / CLIP 2 - furry-il (NovaFurry): euler_a / normal / CFG 4 / 30 steps / CLIP 2 Three prompt-prefix dialects auto-prepended (NEVER cross-contaminated): photoreal models get nothing, Pony gets the full score_9..score_4_up chain (mandatory), and the NoobAI/Illustrious furry models get their booru quality + year-tag prefixes (masterpiece/best quality/absurdres/newest/etc). Workflow now includes a CLIPSetLastLayer node so per-style CLIP skip works. Routing default for generic "furry" flipped from Reed (NAI) to NovaFurry (Illustrious) — current sweet-spot consensus. Removed global DEFAULT_STEPS/DEFAULT_CFG valves; per-style values are canonical. Sources: each model's Civitai page (CyberRealisticXL, Juggernaut, Pony V6 XL, TalmendoXL, Reed FurryMix, IndigoVoid FurryFused, NovaFurryXL) and Pony/Illustrious prompting guides. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:45:34 -05:00
William Gill	cd0034cd99	Flesh out per-style negatives in smart_image_gen Tool Each style now gets a proper baseline covering quality, anatomy, and watermark/signature suppression — plus the appropriate style-leak guards (no-cartoon for photo, no-human for furry, score_4–6 suppression for pony). Quality terms only; no NSFW filtering by default since several checkpoints in this set are commonly used for adult work and would fight a baked-in content filter. If SFW-by-default is wanted, add an explicit safe-mode flag rather than expanding NEGATIVES. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:39:24 -05:00
William Gill	392b26167f	Add smart_image_gen Tool for per-prompt checkpoint routing Open WebUI Tool the LLM invokes instead of the built-in image action. Auto-routes among the seven SDXL checkpoints (photo / juggernaut / pony / general / furry-{nai,noob,il}) based on either an explicit `style` arg or first-match-wins regex over the prompt. Constructs the ComfyUI workflow inline, submits via /prompt, polls /history, returns the result as a base64 data-URI markdown image so no extra hosting is needed. Per-style default negatives. ComfyUI URL / steps / CFG / timeout are admin-tunable Valves. Filters can't see image-gen requests in Open WebUI (the routers skip the filter chain), so the LLM-driven Tool is the only path that gives intent-aware routing without changing the chat UX. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:17:02 -05:00

21 Commits