comfyui-nvidia

Author	SHA1	Message	Date
William Gill	c07e962cae	Image tools: migrate to OWUI 0.9.0 async model accessors Open WebUI 0.9.0 made every model-class accessor (Users.get_user_by_id, Chats.get_chat_by_id, Files.get_file_by_id, …) a coroutine. Both tools were still calling them synchronously, so the calls returned coroutines instead of model objects; the first downstream attribute access threw, the bare `except Exception: return False` swallowed it, and uploads silently fell through to the data-URI fallback. The data-URI markdown rendered during streaming but didn't survive post-stream commit, which looked like "image flashes in, then disappears." Add await to the six call sites; promote `_read_file_dict` to async since it now contains an await; restore `_push_image_to_chat` to the canonical `files` event so the file-attachment chrome (thumbnail + download) comes back. This supersedes commit `d034700`, which mis-diagnosed the symptom as a virtualization regression and switched to a `message`-event markdown workaround. The workaround didn't help (same flash-and-vanish) because the upload pre-check still failed for the same async-migration reason and the data-URI fallback path still ran. smart_image_gen.py 0.7.9 -> 0.7.10 smart_image_pipe.py 0.1.1 -> 0.1.2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 06:16:02 -05:00
William Gill	d034700af9	Image tools: work around OWUI 0.9.x files-event regression Open WebUI 0.9.0 introduced chat-history virtualization that unmounts off-screen assistant messages and reconstructs them from persisted shape; `files` attached mid-stream by a tool don't survive the round-trip — the image flashes in during streaming and disappears the moment the message commits. Both image tools now upload via Open WebUI's file store as before but surface the result as a markdown image injected into the assistant message via a `message` event, which is part of the persisted shape and renders reliably across remounts. Trade-off: loses the file-attachment chrome (thumbnail + download button). Each tool has a TODO marking the swap site with the original `files` payload inlined for one-line revert once upstream fixes the regression. smart_image_gen.py 0.7.8 -> 0.7.9 smart_image_pipe.py 0.1.0 -> 0.1.1 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 06:05:12 -05:00
57_Wolve	28f370a80b	Update deployments/ai-stack/openwebui-tools/smart_image_gen.py	2026-04-20 17:08:32 +00:00
William Gill	02a4bece5d	Ollama: keep loaded models resident until evicted (KEEP_ALIVE=-1) Was 30m, which evicts after 30 minutes of inactivity and forces a reload penalty on the next request. Setting -1 holds models in VRAM indefinitely; MAX_LOADED_MODELS=3 caps how many can stay resident simultaneously (vs the previous 2). Tune MAX higher if you're rotating between more than three models AND your GPU has the VRAM for it — comment in the compose explains the trade-off. For the live srvno.de stack: OLLAMA_KEEP_ALIVE=-1 takes effect on the next `docker compose up -d ollama`. Loaded models survive the restart only if they're re-requested before swap-out anyway. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 19:25:42 -05:00
William Gill	a1af88a632	smart_image_gen v0.7.8: per-job filename_prefix + source-image diagnostic User reported observing a wrong image returned. Two hardenings: 1. _job_prefix() generates a per-submission filename_prefix ('smartedit_<10hex>', 'smartinpaint_<10hex>', 'smartgen_<10hex>') so SaveImage outputs from concurrent jobs sit in their own namespace and ComfyUI's auto-incrementing counter can never produce filenames that overlap across jobs. With a shared prefix, if a queued job's history-fetch ever raced past its own SaveImage record there was a theoretical (if unlikely) path to picking up another job's _00001_.png. Per-job prefix kills that vector. 2. edit_image now emits the source image's SHA-1 and byte count in a status event before uploading to ComfyUI. If a future 'wrong image' report comes in, that hash should match the prior generation's output — if it doesn't, we know _extract_attached_image picked up the wrong source rather than ComfyUI returning the wrong file. Hashlib import is local so the module's import surface stays clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 18:22:45 -05:00
William Gill	d8c8421361	Image Studio: lock in working config — Native + enable_thinking=false User confirmed end-to-end working stack: - base_model_id: huihui_ai/qwen3-vl-abliterated:8b - function_calling: native (gives 'View Result from edit_image' blocks, structured tool call traces) - custom_params: tool_choice: required (forces tool call every turn) enable_thinking: false (server-side disable; abliterated Qwen ignores the /no_think system prompt directive — when thinking is on the tool call leaks inside a thinking block as text) Updated image_studio.json + the markdown setup table + the 'Qwen 3.x quirk' explainer to match. The /no_think line in the system prompt stays in for non-abliterated Qwen variants but is now documented as best-effort backup; enable_thinking=false is the authoritative kill-switch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 18:12:24 -05:00
William Gill	f5a222fe6f	Image Studio: default base model → huihui_ai/qwen3-vl-abliterated:8b User confirmed this model works end-to-end after the multi-base-model search. Settled on it because Qwen 3 VL's fine-tune lineage isn't damaged by abliteration the way Qwen 3.5's is, so it both calls tools reliably AND won't refuse to dispatch on NSFW edit prompts. Updated: - image_studio.json base_model_id → huihui_ai/qwen3-vl-abliterated:8b - init-models.sh: pulls the abliterated VL model in place of the non-working standard qwen3.5:9b - image_studio.md: setup table base-model row + vision-section 'why this and not the alternatives' explanation function_calling stays default and tool_choice required. Operator can flip to native + drop tool_choice once they've verified the new base behaves with structured tool calls (which would also remove the need for a separate Task Model for title generation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 17:56:16 -05:00
William Gill	ec6108888a	smart_image_gen v0.7.7: enforce style inheritance for edit_image Vision-capable LLMs misclassify rendered subjects when picking a style — observed: model called juggernaut for an edit on a furry-il generation because the rendered character looked 'photoreal-ish' to its vision encoder. Each visual judgment is independent so styles flip mid-chat. Flipped resolution order in edit_image so inheritance from the prior generate_image / edit_image call DOMINATES the LLM's explicit style arg. The LLM's choice only wins when there's nothing to inherit (first edit in a chat, fresh user upload). Workaround for legitimate style changes is starting a new chat. System prompt updated to match: tells the LLM that style inheritance is enforced, that passing style on follow-up calls is ignored, and that user requests for style change require a new chat. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 17:53:32 -05:00
William Gill	20d4bd5b72	Image Studio: switch base model to qwen3.5:9b (non-abliterated) The abliterated 9B was the source of the tool-call format mangling (both Native XML leaks and Default Python-syntax leaks). Standard qwen3.5:9b is the same family, same 9B size (6.6 GB), vision-capable and native tool calling actually works. The image content uncensored-ness was always going to come from the SDXL checkpoints in ComfyUI — the LLM is just a dispatcher. Picking a well-behaved tool-caller for that role doesn't compromise output content. Updated: - image_studio.json base_model_id → qwen3.5:9b - init-models.sh: pulls qwen3.5:9b as a standard registry pull, in addition to the existing abliterated 9B (which stays for other chat models) - image_studio.md setup table + vision section explaining why we chose standard over abliterated for the dispatcher role function_calling stays as 'default' and tool_choice as 'required' for now — they don't hurt with a reliable tool-caller and operators can flip back to native + drop tool_choice once they verify it works for them (which also removes the need for a separate Task Model for title generation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 17:09:42 -05:00
William Gill	1ae451ad5f	Add smart_image_pipe.py — deterministic Pipe for image gen / edit / inpaint Registers as 'Image Studio (Pipe)' in Open WebUI's chat-model dropdown. No LLM in the loop — the Pipe parses the user's message with regex, finds attached images via the same multi-source extractor as the Tool, routes deterministically to txt2img / img2img / inpaint, calls ComfyUI, returns the image. Bypasses the abliterated Qwen 3.5 × Open WebUI tool-call format interop bug entirely. Style resolution: explicit FORCE_STYLE valve > inherited from prior assistant message ('style: X' marker) > keyword regex on user text > default (furry-il). Inpaint trigger: regex picks up phrasings like 'change the X', 'remove the Y', 'make the Z bigger/red/etc' and pulls the noun phrase as mask_text. No match → full img2img. Reuses the same per-style settings, prefix dialects, negatives, GrowMask feathering, file extraction (with chat-DB fallback) and files-event push as smart_image_gen.py — code is duplicated rather than shared because Open WebUI loads each plugin file standalone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:58:01 -05:00
William Gill	011fade024	Image Studio prompt: forbid post-tool echo of the function call User saw the LLM's chat response include a literal 'edit_image(prompt="...", mask_text="...", style="furry-il", denoise=0.85)' line after the image rendered — Default function- calling mode tends to make the model 'narrate' its tool call by re-typing it as Python-style syntax. Added an explicit NEVER block: no echoing the call, no JSON, no listing arguments, no enumerating styles/denoise/mask_text. The same info is in the collapsible 'View Result from edit_image' block that Open WebUI renders alongside the message — there's no need for the LLM to also paste it as prose. Follow-up text is for human conversation, not bookkeeping. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:41:51 -05:00
William Gill	63917709c1	smart_image_gen v0.7.6: feather inpaint mask edges via GrowMask The raw SAM mask is a hard binary edge — KSampler repaints right up to it, and SDXL has no surrounding-pixel context inside the mask to blend with. Result: the inpainted region looks pasted-on with visible seams (the artifact the user reported on the werewolf-groin edit). Inserted a stock GrowMask node (id 17) between GroundingDinoSAMSegment and SetLatentNoiseMask: - expand=12 grows the mask outward by 12 px so the new content overlaps a strip of original pixels for blending - tapered_corners=True softens the edge so the noise transition isn't a step function GrowMask is built into stock ComfyUI; no extra custom node install. KSampler still uses the caller-supplied denoise (default 1.0 in inpaint mode). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:40:39 -05:00
William Gill	1ed2e7293e	Image Studio: ship function_calling=default — Native leaks Qwen 3.5 XML Qwen 3.5 abliterated emits its native tool-call format (<function=...><parameter=...>) wrapped in <tool_call> tags that the current Open WebUI / Ollama parser does not reliably round-trip — the XML leaks to chat as plain text instead of executing. Switching the preset to Function Calling: Default, which uses Open WebUI's own prompt-injection wrapper, fires the tool reliably. Native is documented as the right choice only when the operator has swapped the base model to one with proven OWUI-side parser support (mistral-nemo:12b, qwen2.5vl:7b). For the shipped Qwen 3.5 abliterated default, Default is the working setting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:36:05 -05:00
William Gill	18a205d69d	smart_image_gen v0.7.5: fix black-image bug — fetch from SaveImage explicitly _submit_and_fetch was iterating history[prompt_id]['outputs'].values() and grabbing the first image it saw. With the inpaint workflow that includes nodes other than SaveImage that emit IMAGE outputs (the GroundingDinoSAMSegment node returns an overlay/mask-applied image in addition to the mask), and dict iteration order is undefined — sometimes we'd return the overlay (which can render mostly black) instead of the actual SaveImage result. Fix: prefer outputs from the SaveImage node id ('9' in every workflow the tool builds) explicitly. Fall back to scanning all outputs only if SaveImage didn't appear (workflow drift, manual edit, etc). User reported seeing the correct inpaint in ComfyUI's native UI but black in chat — this is the gap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:30:04 -05:00
William Gill	0fa8040251	Eliminate first-inpaint timeout: preseed SAM/GroundingDINO + 600s default Two changes to address the timeout-and-retry loop the user hit on the first edit_image call: 1. comfyui-init-models.sh now fetches the three weights inpaint needs into /models/sams and /models/grounding-dino: - sam_hq_vit_h.pth (~2.5 GB) - groundingdino_swint_ogc.pth (~700 MB) - GroundingDINO_SwinT_OGC.cfg.py (~1 KB) Without preseeding these auto-download on first inpaint, which takes minutes and times out the tool call. The mkdir line gets the new subdirs added too. 2. Tool TIMEOUT_SECONDS valve default bumped 240s → 600s as defense-in-depth — even with weights preseeded, BERT-base auto-downloads via transformers on first GroundingDINO load (~30s) and a slow KSampler on a contended GPU can push past 4 minutes occasionally. Steady-state runs still finish in under a minute; the valve only matters for first-call latency. After comfyui-model-init re-runs (`docker compose up -d comfyui-model-init`), first inpaint should be near-instant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 15:39:23 -05:00
William Gill	e77666ea0f	Image Studio docs: require setting a separate Task Model after install tool_choice: required (the thing that makes Image Studio reliably fire its tools) also blocks Open WebUI's background text-only calls — title generation, tag suggestions, autocomplete — because the model is forced to produce a tool call instead of text. Result: chats stay named 'New Chat' and tag suggestions go silent. Documented the fix in two places: - image_studio.md: dedicated 'Set a separate Task Model (required after install)' section explaining the cause and the fix path. - deployment README §9: short follow-up note pointing at it so operators don't miss it during initial setup. The fix is purely Open WebUI configuration — no code change. Pick any non-Image-Studio model already pulled (mistral-nemo:12b is the obvious default) for the Task Model slot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 15:37:21 -05:00
William Gill	6700f6ce33	smart_image_gen v0.7.3: edit_image inherits style from prior tool call User reported edit_image picking 'juggernaut' (photoreal) for an edit on a furry image — the LLM didn't carry context, and the tool's fallback _route_style only sees the edit instruction text, which for neutral edits ('bigger', 'glowing eyes') has no furry keywords. Fix in two places: 1. Tool: _inherited_style scans __messages__ in reverse for prior generate_image / edit_image tool calls and returns the style arg they used. edit_image now resolves: explicit style → inherited → keyword fallback. Deterministic, no LLM cooperation needed for follow-up edits on previously-generated images. 2. System prompt: explicit three-step style resolution for edit_image. Generated by you → omit style and auto-inherit. Uploaded by user → INSPECT visually and pick a matching style (the LLM is the only thing with vision; the tool can't see pixels). Then keep that style for subsequent edits. Both paths matter — the tool fix handles the common case deterministically, the prompt fix handles the upload case where there's nothing to inherit from. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 15:31:57 -05:00
William Gill	def27087c1	Make every image tag in compose pinnable via .env Floating tags (`latest`, `main`) made deploys non-deterministic — a container recreate could pull a newer Open WebUI, Ollama, or Anubis at any time. Wrapped every image: src in a ${VAR:-default} substitution and surfaced the full set in .env.example with a header explaining where to find current versions and bumped COMFYUI_IMAGE_TAG default to 0.2.1 (the just-tagged version with the transformers pin). Vars added: CADDY_TAG, OLLAMA_TAG, OPEN_WEBUI_TAG, ALPINE_TAG, ANUBIS_TAG (COMFYUI_IMAGE_TAG already existed). Defaults match the previous floating-tag behaviour for ones I'm not confident which specific version to pin (Ollama, Open WebUI, Anubis) — operator should update those to verified versions for production deploys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 15:27:08 -05:00
William Gill	f26dfbee02	smart_image_gen v0.7.2: chat-DB fallback + diagnostic 'no image' msg If __messages__ doesn't include the assistant's prior file attachments (which is what the screenshot is showing), the new fallback queries the chat by id via Chats.get_chat_by_id and walks every persisted message for files. Open WebUI's socket handler always upserts files onto the assistant message via {'files': files} so this path is authoritative. The 'No image found' return now includes diagnostic counts — __files__, __messages__, messages_with_files, chat_id_present, openwebui_runtime — so subsequent failures actually show what the tool saw instead of being opaque. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 15:07:55 -05:00
William Gill	06433d3815	smart_image_gen v0.7.1: rename edit_image arg + parse file id from URL Two bugs in one screenshot: 1. LLM called edit_image(prompt=..., ...) but the signature was edit_image(edit_instruction=..., ...) — mismatch, missing-arg crash. Renamed the first param to `prompt` so both tools have a matching, predictable name. System prompt updated with an explicit 'do not invent edit_instruction' line for stubborn models. 2. After fix #1, edit_image still couldn't find the prior generated image because Open WebUI assistant-message file attachments only carry {type, url} (no id, no path). _read_file_dict now also greps the file id out of /api/v1/files/<uuid>/content URLs and feeds it to Files.get_file_by_id. Verified pattern matches absolute URLs (https://llm-1.srvno.de/api/v1/files/.../content). System prompt also now says 'including images you previously generated in this chat' to nudge the LLM to pick up assistant outputs as edit candidates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:58:40 -05:00
William Gill	780ce42711	Image Studio: move tool_choice into params.custom_params (correct field) Previous commit put tool_choice at the top level of params. Open WebUI drops that silently — apply_model_params_to_body has a whitelist of mapped param names (temperature, top_p, etc.) and tool_choice isn't on it. The Custom Parameters UI section also only iterates params.custom_params, which is why the value didn't appear there after importing the preset. Correct location is the custom_params sub-dict, where values go through json.loads before being merged into the outgoing chat completion body. 'required' stays a string after the failed json.loads and ends up exactly where the OpenAI / Ollama tools spec expects it. Source: src/lib/components/chat/Settings/Advanced/AdvancedParams.svelte (UI binding) and backend/open_webui/utils/payload.py (serialization). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:50:34 -05:00
William Gill	f6f5690fcd	smart_image_gen v0.7: edit_image finds previously-emitted images Bug: after generate_image surfaced an image via the files event, the next edit_image call returned 'No image found in the chat'. The image was attached to the assistant's message, but _extract_attached_image only scanned the user's __files__ param and image_url content blocks on user messages — it never looked at messages.files for any role. Fix: rewrite extraction to scan messages[].files in reverse for ALL roles, so an assistant-emitted image from a prior tool call is found the same way as a user-attached upload. Use Open WebUI's internal Files.get_file_by_id when the file dict has an id, so we get raw bytes from disk without going through the auth-protected /api/v1/files/{id}/content endpoint. Old path-key and URL-fetch paths kept as fallbacks. Refactored shared helpers _file_dict_is_image and _read_file_dict out of the loop to keep the search logic readable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:46:10 -05:00
William Gill	d935e24624	Add text-targeted inpainting via GroundingDINO+SAM (mask_text param) All checks were successful release / Build & Push Docker Image (push) Successful in 44s Details Five pieces: 1. Dockerfile installs storyicon/comfyui_segment_anything (GroundingDINO + SAM-HQ in one bundle) into custom_nodes and pip-installs its requirements at build time. Model weights auto-download to the comfyui-models volume on first inpaint (~3 GB one-time cost). 2. install-custom-node-deps.sh — entrypoint wrapper that pip-installs requirements.txt for any custom_node present at startup. Lets users add custom nodes via ComfyUI-Manager (or by git-cloning into the volume) and have the deps picked up on the next restart, without editing the Dockerfile. 3. smart_image_gen v0.6: edit_image gains a `mask_text` param. When set, builds an inpainting workflow (LoadImage → GroundingDinoSAM Segment → SetLatentNoiseMask → KSampler) so only the named region is repainted. When unset, falls through to the existing img2img path. Denoise default switches: 1.0 with mask_text (full repaint within mask), 0.7 without. 4. Image Studio system prompt teaches the LLM the LOCAL vs GLOBAL distinction — set mask_text whenever the user names a specific object/region ('the ball', 'the dog', 'the sky'); leave it unset only for whole-image style/lighting transformations. 5. Deployment README documents the new mode + the first-inpaint weight-download caveat. Image rebuild required — bump tag to pick up the Dockerfile change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:43:52 -05:00
William Gill	7c7897818e	Image Studio: bake tool_choice=required into the preset Without it, abliterated/reasoning models like huihui_ai/qwen3.5- abliterated:9b reliably choose to write a planning response instead of calling the tool — even with /no_think and a terse imperative system prompt. tool_choice=required is passed through to Ollama's chat API and removes the model's option to respond in text at all, forcing exactly one tool call per turn. Confirmed working with the abliterated Qwen 3.5 9B base. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:30:49 -05:00
William Gill	a1fca4d5d9	Image Studio: default base model → huihui_ai/qwen3.5-abliterated:9b Re-imports of image_studio.json kept reverting the base model back to mistral-nemo:12b because that was still hard-coded in the JSON. Updated the JSON, the markdown setup table, and the vision-capability section to lead with the Qwen 3.5 abliterated 9B preset. Re-ordered the markdown's vision section: shipped default first (Qwen 3.5 abliterated, with the /no_think + enable_thinking caveat called out explicitly), alternatives (qwen2.5vl:7b, llama3.2-vision, minicpm-v) second, non-vision fallback third. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:22:13 -05:00
William Gill	0b1c2ee5b5	Image Studio: tighten system prompt, add /no_think for Qwen 3.x User reported the model writing a multi-paragraph 'editing plan' instead of calling edit_image, only firing the tool when explicitly told to. Two underlying causes: 1. The previous system prompt was conversational ('ALWAYS / NEVER' lists with discussion) — Qwen-style models read that as topics to think about rather than rules to obey. Replaced with terse, imperative dispatcher framing: 'You do not respond in prose. Every user message MUST result in exactly one tool call.' 2. Qwen 3.x ships with thinking mode on by default. Reasoning models almost universally degrade native function calling — they plan how to use a tool instead of just calling it. Prepended /no_think (Qwen 3.x recognises this token and skips reasoning). No-op for non-Qwen-3 base models. Removed the long after-action paragraph that encouraged elaborate follow-ups; replaced with 'at most one short sentence'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:18:49 -05:00
William Gill	5a34ced8f1	Add S3 mirror path for Ollama models + mirror-ollama-model.sh helper Three pieces: 1. mirror-ollama-model.sh — run on any machine that has the model pulled. Parses the manifest at ~/.ollama/models/manifests/registry.ollama.ai/<ns>/<name>/<tag>, greps every sha256:* digest, tars manifest + referenced blobs into one .tgz. Output is portable — extract over any other Ollama data dir and the model is immediately visible. 2. init-models.sh gains an s3_pull function that curls a tarball from $S3_OLLAMA_BASE and extracts into /root/.ollama/models/. Falls back to ollama pull when S3_OLLAMA_BASE is unset, so s3_pull lines are safe to commit before the bucket is ready. huihui_ai/qwen3.5- abliterated:9b promoted to s3_pull as the example. 3. docker-compose.yml model-init service propagates S3_OLLAMA_BASE from .env. Curl auto-installs at script start because ollama/ollama doesn't always ship it. README documents the mirror workflow under "Mirroring models to S3". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 13:43:26 -05:00
William Gill	f77f5993fb	Image Studio: enable vision capability + document upgrade path Open WebUI was blocking image attachments to the Image Studio model because mistral-nemo:12b isn't vision-capable. Two changes: - capabilities.vision flipped to true in the preset JSON. The Tool only needs the image to make it through __messages__ / __files__ to call edit_image; the actual visual processing happens in ComfyUI's img2img, not in the LLM. Setting the flag unlocks the attach-image UI without lying about what mistral-nemo can do. - System prompt now tells the LLM explicitly: "you may not be able to visually inspect the attached image — that is fine. Trust the user's description and call edit_image." Prevents the LLM from refusing or hedging when it gets an image it can't see. Documented the upgrade path in image_studio.md for users who want real vision (qwen2.5vl:7b, llama3.2-vision:11b, minicpm-v:8b — pick one, add to init-models.sh, swap base_model_id in the preset). The vision LLM can then write smarter edit_image calls from the image content rather than the user's description alone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 13:31:17 -05:00
William Gill	b604e3f509	smart_image_gen v0.5: surface images via files event (canonical path) The data-URI message-event approach didn't render — Open WebUI's chat frontend ignores data URIs from tool-emitted message events because the markdown-base64 rewriter (utils/files.py convert_markdown_base64 _images) only runs on assistant streaming content, not on tool emits. Switched to the path Open WebUI's own image-generation flow uses (backend/open_webui/utils/middleware.py ~1325): 1. Upload image bytes via open_webui.routers.files.upload_file_handler (gets back a file_item with id) 2. Resolve the served URL via request.app.url_path_for( "get_file_content_by_id", id=file_item.id) → /api/v1/files/{id}/content 3. Emit a `files` event: {"type": "files", "data": {"files": [{"type": "image", "url": ...}]}} Tools now take __request__, __user__, __metadata__ params for the upload (Open WebUI auto-injects these). Falls back to data-URI message event if the runtime imports aren't available (e.g. running the file standalone for tests). The internal upload bypasses get_verified_user via the user= kwarg, so no token plumbing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 13:21:48 -05:00
William Gill	4d996e1205	smart_image_gen v0.4: emit image to chat, return only confirmation The data URI returned from the tool was being given to the LLM as the tool result — the LLM then either echoed the base64 to the user as plain text (screenshot 1) or hallucinated a description of what it thought the image looked like (screenshot 2 — "an image of a cat sitting on a windowsill" for a fox-warrior prompt). Fix: push the markdown image into the chat directly via __event_emitter__ as a "message" event, and return a short text confirmation as the function value. The confirmation is worded to prevent the LLM from describing the image or repeating the markdown (both common failure modes for tool-using LLMs). Both generate_image and edit_image fixed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 13:14:59 -05:00
William Gill	6adf133558	Ship Image Studio as importable JSON in addition to markdown walkthrough Open WebUI accepts a JSON file at Workspace → Models → Import that seeds a new model preset in one click instead of the manual table- driven setup. The new image_studio.json mirrors the Open WebUI bulk- export schema (array wrapper around the model object with id, name, base_model_id, params, meta) and pre-fills system prompt, native function calling, temperature 0.5, top_p 0.9, smart_image_gen tool attachment, suggestion prompts. The markdown walkthrough stays as the source of truth for the system prompt content and as the fallback when import fails (e.g. tool ID mismatch, unfamiliar field, schema drift across Open WebUI versions). README points at both paths. Caveat doc'd in the markdown: if the imported preset doesn't actually have smart_image_gen attached, the tool ID in the JSON didn't match what Open WebUI assigned — re-attach manually in the model edit screen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 13:04:49 -05:00
William Gill	d4e2058859	smart_image_gen v0.3: add edit_image (img2img) method The Tool now exposes two methods the LLM picks between based on whether the user attached an image: generate_image — txt2img (existing, unchanged behavior) edit_image — img2img on the most recently attached image edit_image extracts the source image from __messages__ (base64 data URIs in image_url content blocks) or __files__ (local path or URL), uploads to ComfyUI's /upload/image, runs an img2img workflow at the caller-specified denoise (default 0.7), and returns the edited result. Same per-style routing / sampler / CFG / prefix logic as generation. Refactored the submit-and-poll loop into _submit_and_fetch shared by both methods. Image extraction is defensive — tries messages first, then files (path then URL), returns a clear "no image attached" message rather than silently generating from scratch. Image Studio system prompt rewritten to teach the LLM when to call edit_image vs generate_image and how to pick denoise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:59:13 -05:00
William Gill	41d571d8d1	Add Image Studio model preset — forces smart_image_gen tool use A documented Open WebUI custom-model preset wrapping mistral-nemo:12b with: aggressive system prompt that mandates calling generate_image, only the smart_image_gen tool attached, native function calling, lower temperature for tool-call reliability. Users pick "Image Studio" from the chat-model dropdown when they want images. Solves the common case where general-purpose chat models describe an image in text instead of firing the tool — usually on conversational phrasings like "can you draw me…". The preset removes the ambiguity by giving the LLM exactly one job and one tool. Setup walkthrough in openwebui-models/image_studio.md; deployment README §9 points users at it as the recommended path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:54:13 -05:00
William Gill	9e22de0328	smart_image_gen: tighten docstring + Literal style enum Two changes to make the LLM more likely to call the tool: 1. Lead the docstring with an unambiguous directive — "Create an image and show it to the user. Use this whenever the user asks you to draw, generate, ..." plus a hard "do not say you cannot generate images" line. Open WebUI feeds the docstring straight to the LLM as the tool description; first line carries the most weight. 2. `style: Optional[StyleName]` where StyleName is a Literal enum of the seven values. Native function-calling models read the type annotation and present the seven valid values to the LLM as a strict choice instead of a free-text param. If the LLM still doesn't fire the tool, the install is probably wrong: Workspace → Models → the model → Advanced Params → Function Calling must be set to Native (not Default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:52:26 -05:00
William Gill	b815cd6a5f	Tune static workflows to CyberRealisticXL recommended settings The static workflow JSONs default to CyberRealisticXLPlay (set in an earlier commit), but the KSampler still had euler/normal/CFG7/20 — the generic settings I scaffolded with. Updated to the creator-published defaults: dpmpp_2m_sde / karras / CFG 4 / 28 steps. CLIP skip 1 already correct (no node needed; default behavior). Added a section to the deployment README spelling out the trade-off: static workflows are locked to one checkpoint family at a time because Open WebUI's nodes mapping doesn't expose sampler/CFG/scheduler/CLIP skip/prefix. For multi-checkpoint use, the smart_image_gen Tool path is the only one that gets these right per-prompt. Re-paste workflows into Open WebUI Settings → Images to pick up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:47:46 -05:00
William Gill	45d5541be0	smart_image_gen v0.2: per-style sampler/CFG/steps/CLIP-skip + prompt prefixes Researched each of the seven SDXL checkpoints on Civitai and encoded the creator-recommended generation defaults per style instead of one global set. Material differences: - photo (CyberRealistic): dpmpp_2m_sde / karras / CFG 4 / 28 steps / CLIP 1 - juggernaut: dpmpp_2m_sde / karras / CFG 4.5 / 35 steps / CLIP 1 - pony: euler_a / normal / CFG 7.5 / 25 steps / CLIP 2 - general (Talmendo): dpmpp_2m / karras / CFG 8 / 30 steps / CLIP 2 - furry-nai (Reed): euler_a / normal / CFG 5 / 30 steps / CLIP 2 - furry-noob (IndigoVoid): euler_a-only / normal / CFG 4.5 / 20 / CLIP 2 - furry-il (NovaFurry): euler_a / normal / CFG 4 / 30 steps / CLIP 2 Three prompt-prefix dialects auto-prepended (NEVER cross-contaminated): photoreal models get nothing, Pony gets the full score_9..score_4_up chain (mandatory), and the NoobAI/Illustrious furry models get their booru quality + year-tag prefixes (masterpiece/best quality/absurdres/newest/etc). Workflow now includes a CLIPSetLastLayer node so per-style CLIP skip works. Routing default for generic "furry" flipped from Reed (NAI) to NovaFurry (Illustrious) — current sweet-spot consensus. Removed global DEFAULT_STEPS/DEFAULT_CFG valves; per-style values are canonical. Sources: each model's Civitai page (CyberRealisticXL, Juggernaut, Pony V6 XL, TalmendoXL, Reed FurryMix, IndigoVoid FurryFused, NovaFurryXL) and Pony/Illustrious prompting guides. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:45:34 -05:00
William Gill	cd0034cd99	Flesh out per-style negatives in smart_image_gen Tool Each style now gets a proper baseline covering quality, anatomy, and watermark/signature suppression — plus the appropriate style-leak guards (no-cartoon for photo, no-human for furry, score_4–6 suppression for pony). Quality terms only; no NSFW filtering by default since several checkpoints in this set are commonly used for adult work and would fight a baked-in content filter. If SFW-by-default is wanted, add an explicit safe-mode flag rather than expanding NEGATIVES. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:39:24 -05:00
William Gill	392b26167f	Add smart_image_gen Tool for per-prompt checkpoint routing Open WebUI Tool the LLM invokes instead of the built-in image action. Auto-routes among the seven SDXL checkpoints (photo / juggernaut / pony / general / furry-{nai,noob,il}) based on either an explicit `style` arg or first-match-wins regex over the prompt. Constructs the ComfyUI workflow inline, submits via /prompt, polls /history, returns the result as a base64 data-URI markdown image so no extra hosting is needed. Per-style default negatives. ComfyUI URL / steps / CFG / timeout are admin-tunable Valves. Filters can't see image-gen requests in Open WebUI (the routers skip the filter chain), so the LLM-driven Tool is the only path that gives intent-aware routing without changing the chat UX. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:17:02 -05:00
William Gill	704bcfdf13	Default workflows to SDXL CyberRealistic; ship empty model preseed Drops the SD 1.5 placeholder. The shipped txt2img/img2img workflows now reference CyberRealisticXLPlay_V8.0_FP16.safetensors (the checkpoint figment used in production), and comfyui-init-models.sh ships with no active fetches — operators uncomment examples or add their own URLs. The script + workflow filenames have to line up; README explains. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:02:26 -05:00
William Gill	0ad99b6199	Add comfyui-model-init sidecar for ComfyUI model preseeding Mirrors the Ollama model-init pattern: a one-shot Alpine container that mounts the comfyui-models volume and runs comfyui-init-models.sh, which curls direct download URLs (HuggingFace by default) into the right subdirectories. Idempotent — already-present files are skipped. HF_TOKEN is plumbed through for gated repos (Flux-dev, SD3, etc.) and is opt-in via .env. The default list ships SD 1.5 only, matching the placeholder filename in workflows/*.json. Examples for SDXL, Flux, and upscalers are commented in the script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 11:57:24 -05:00
William Gill	21e976e275	Externalise WEBUI_URL / LLM_URL to .env All checks were successful release / Build & Push Docker Image (push) Successful in 31m43s Details So changing the deployment's hostnames is a one-file edit (.env) instead of touching docker-compose.yml. WEBUI_URL is the full URL with scheme (Open WebUI uses it for auth redirects); LLM_URL is the bare hostname (Anubis wants it for COOKIE_DOMAIN). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 10:49:29 -05:00
William Gill	97547c783c	Make ai-stack the only deployment shape Drops the duplicate standalone compose / .env.example / SETUP.md at the repo root. Bring-up content folded into deployments/ai-stack/README.md so there's exactly one set of deployment instructions, sitting next to the files it describes. Root README is now just the repo overview and a pointer at the deployment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 10:45:23 -05:00
William Gill	b1c9bff15f	Match init-models.sh to the live preseed list Five models from the production GPU host's current pull set. Picks up the idempotency-checking loop pattern from the source script so re-runs print "already present" instead of re-pulling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 10:41:29 -05:00
William Gill	5b61caa197	Add deployments/ai-stack — combined production-shape example Sanitized snapshot of the live srvno.de stack: Caddy + Ollama (with preseed) + ComfyUI + Open WebUI + Anubis stub. Real hostnames, secrets, and bcrypt hash replaced with placeholders so the dir is safe to commit. Caddyfile updated to point at comfyui:8188 (the source file pointed at the now-removed forge service). Dropped FIGMENT_/FORGE_/SEGMENT_IMAGE_TAG from the env example. Harmonised the init-models.sh mount path between ollama and model-init services. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 10:40:41 -05:00

44 Commits