comfyui-nvidia

Author	SHA1	Message	Date
William Gill	d8c8421361	Image Studio: lock in working config — Native + enable_thinking=false User confirmed end-to-end working stack: - base_model_id: huihui_ai/qwen3-vl-abliterated:8b - function_calling: native (gives 'View Result from edit_image' blocks, structured tool call traces) - custom_params: tool_choice: required (forces tool call every turn) enable_thinking: false (server-side disable; abliterated Qwen ignores the /no_think system prompt directive — when thinking is on the tool call leaks inside a thinking block as text) Updated image_studio.json + the markdown setup table + the 'Qwen 3.x quirk' explainer to match. The /no_think line in the system prompt stays in for non-abliterated Qwen variants but is now documented as best-effort backup; enable_thinking=false is the authoritative kill-switch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 18:12:24 -05:00
William Gill	f5a222fe6f	Image Studio: default base model → huihui_ai/qwen3-vl-abliterated:8b User confirmed this model works end-to-end after the multi-base-model search. Settled on it because Qwen 3 VL's fine-tune lineage isn't damaged by abliteration the way Qwen 3.5's is, so it both calls tools reliably AND won't refuse to dispatch on NSFW edit prompts. Updated: - image_studio.json base_model_id → huihui_ai/qwen3-vl-abliterated:8b - init-models.sh: pulls the abliterated VL model in place of the non-working standard qwen3.5:9b - image_studio.md: setup table base-model row + vision-section 'why this and not the alternatives' explanation function_calling stays default and tool_choice required. Operator can flip to native + drop tool_choice once they've verified the new base behaves with structured tool calls (which would also remove the need for a separate Task Model for title generation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 17:56:16 -05:00
William Gill	ec6108888a	smart_image_gen v0.7.7: enforce style inheritance for edit_image Vision-capable LLMs misclassify rendered subjects when picking a style — observed: model called juggernaut for an edit on a furry-il generation because the rendered character looked 'photoreal-ish' to its vision encoder. Each visual judgment is independent so styles flip mid-chat. Flipped resolution order in edit_image so inheritance from the prior generate_image / edit_image call DOMINATES the LLM's explicit style arg. The LLM's choice only wins when there's nothing to inherit (first edit in a chat, fresh user upload). Workaround for legitimate style changes is starting a new chat. System prompt updated to match: tells the LLM that style inheritance is enforced, that passing style on follow-up calls is ignored, and that user requests for style change require a new chat. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 17:53:32 -05:00
William Gill	20d4bd5b72	Image Studio: switch base model to qwen3.5:9b (non-abliterated) The abliterated 9B was the source of the tool-call format mangling (both Native XML leaks and Default Python-syntax leaks). Standard qwen3.5:9b is the same family, same 9B size (6.6 GB), vision-capable and native tool calling actually works. The image content uncensored-ness was always going to come from the SDXL checkpoints in ComfyUI — the LLM is just a dispatcher. Picking a well-behaved tool-caller for that role doesn't compromise output content. Updated: - image_studio.json base_model_id → qwen3.5:9b - init-models.sh: pulls qwen3.5:9b as a standard registry pull, in addition to the existing abliterated 9B (which stays for other chat models) - image_studio.md setup table + vision section explaining why we chose standard over abliterated for the dispatcher role function_calling stays as 'default' and tool_choice as 'required' for now — they don't hurt with a reliable tool-caller and operators can flip back to native + drop tool_choice once they verify it works for them (which also removes the need for a separate Task Model for title generation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 17:09:42 -05:00
William Gill	011fade024	Image Studio prompt: forbid post-tool echo of the function call User saw the LLM's chat response include a literal 'edit_image(prompt="...", mask_text="...", style="furry-il", denoise=0.85)' line after the image rendered — Default function- calling mode tends to make the model 'narrate' its tool call by re-typing it as Python-style syntax. Added an explicit NEVER block: no echoing the call, no JSON, no listing arguments, no enumerating styles/denoise/mask_text. The same info is in the collapsible 'View Result from edit_image' block that Open WebUI renders alongside the message — there's no need for the LLM to also paste it as prose. Follow-up text is for human conversation, not bookkeeping. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:41:51 -05:00
William Gill	1ed2e7293e	Image Studio: ship function_calling=default — Native leaks Qwen 3.5 XML Qwen 3.5 abliterated emits its native tool-call format (<function=...><parameter=...>) wrapped in <tool_call> tags that the current Open WebUI / Ollama parser does not reliably round-trip — the XML leaks to chat as plain text instead of executing. Switching the preset to Function Calling: Default, which uses Open WebUI's own prompt-injection wrapper, fires the tool reliably. Native is documented as the right choice only when the operator has swapped the base model to one with proven OWUI-side parser support (mistral-nemo:12b, qwen2.5vl:7b). For the shipped Qwen 3.5 abliterated default, Default is the working setting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:36:05 -05:00
William Gill	e77666ea0f	Image Studio docs: require setting a separate Task Model after install tool_choice: required (the thing that makes Image Studio reliably fire its tools) also blocks Open WebUI's background text-only calls — title generation, tag suggestions, autocomplete — because the model is forced to produce a tool call instead of text. Result: chats stay named 'New Chat' and tag suggestions go silent. Documented the fix in two places: - image_studio.md: dedicated 'Set a separate Task Model (required after install)' section explaining the cause and the fix path. - deployment README §9: short follow-up note pointing at it so operators don't miss it during initial setup. The fix is purely Open WebUI configuration — no code change. Pick any non-Image-Studio model already pulled (mistral-nemo:12b is the obvious default) for the Task Model slot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 15:37:21 -05:00
William Gill	6700f6ce33	smart_image_gen v0.7.3: edit_image inherits style from prior tool call User reported edit_image picking 'juggernaut' (photoreal) for an edit on a furry image — the LLM didn't carry context, and the tool's fallback _route_style only sees the edit instruction text, which for neutral edits ('bigger', 'glowing eyes') has no furry keywords. Fix in two places: 1. Tool: _inherited_style scans __messages__ in reverse for prior generate_image / edit_image tool calls and returns the style arg they used. edit_image now resolves: explicit style → inherited → keyword fallback. Deterministic, no LLM cooperation needed for follow-up edits on previously-generated images. 2. System prompt: explicit three-step style resolution for edit_image. Generated by you → omit style and auto-inherit. Uploaded by user → INSPECT visually and pick a matching style (the LLM is the only thing with vision; the tool can't see pixels). Then keep that style for subsequent edits. Both paths matter — the tool fix handles the common case deterministically, the prompt fix handles the upload case where there's nothing to inherit from. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 15:31:57 -05:00
William Gill	06433d3815	smart_image_gen v0.7.1: rename edit_image arg + parse file id from URL Two bugs in one screenshot: 1. LLM called edit_image(prompt=..., ...) but the signature was edit_image(edit_instruction=..., ...) — mismatch, missing-arg crash. Renamed the first param to `prompt` so both tools have a matching, predictable name. System prompt updated with an explicit 'do not invent edit_instruction' line for stubborn models. 2. After fix #1, edit_image still couldn't find the prior generated image because Open WebUI assistant-message file attachments only carry {type, url} (no id, no path). _read_file_dict now also greps the file id out of /api/v1/files/<uuid>/content URLs and feeds it to Files.get_file_by_id. Verified pattern matches absolute URLs (https://llm-1.srvno.de/api/v1/files/.../content). System prompt also now says 'including images you previously generated in this chat' to nudge the LLM to pick up assistant outputs as edit candidates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:58:40 -05:00
William Gill	780ce42711	Image Studio: move tool_choice into params.custom_params (correct field) Previous commit put tool_choice at the top level of params. Open WebUI drops that silently — apply_model_params_to_body has a whitelist of mapped param names (temperature, top_p, etc.) and tool_choice isn't on it. The Custom Parameters UI section also only iterates params.custom_params, which is why the value didn't appear there after importing the preset. Correct location is the custom_params sub-dict, where values go through json.loads before being merged into the outgoing chat completion body. 'required' stays a string after the failed json.loads and ends up exactly where the OpenAI / Ollama tools spec expects it. Source: src/lib/components/chat/Settings/Advanced/AdvancedParams.svelte (UI binding) and backend/open_webui/utils/payload.py (serialization). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:50:34 -05:00
William Gill	d935e24624	Add text-targeted inpainting via GroundingDINO+SAM (mask_text param) All checks were successful release / Build & Push Docker Image (push) Successful in 44s Details Five pieces: 1. Dockerfile installs storyicon/comfyui_segment_anything (GroundingDINO + SAM-HQ in one bundle) into custom_nodes and pip-installs its requirements at build time. Model weights auto-download to the comfyui-models volume on first inpaint (~3 GB one-time cost). 2. install-custom-node-deps.sh — entrypoint wrapper that pip-installs requirements.txt for any custom_node present at startup. Lets users add custom nodes via ComfyUI-Manager (or by git-cloning into the volume) and have the deps picked up on the next restart, without editing the Dockerfile. 3. smart_image_gen v0.6: edit_image gains a `mask_text` param. When set, builds an inpainting workflow (LoadImage → GroundingDinoSAM Segment → SetLatentNoiseMask → KSampler) so only the named region is repainted. When unset, falls through to the existing img2img path. Denoise default switches: 1.0 with mask_text (full repaint within mask), 0.7 without. 4. Image Studio system prompt teaches the LLM the LOCAL vs GLOBAL distinction — set mask_text whenever the user names a specific object/region ('the ball', 'the dog', 'the sky'); leave it unset only for whole-image style/lighting transformations. 5. Deployment README documents the new mode + the first-inpaint weight-download caveat. Image rebuild required — bump tag to pick up the Dockerfile change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:43:52 -05:00
William Gill	7c7897818e	Image Studio: bake tool_choice=required into the preset Without it, abliterated/reasoning models like huihui_ai/qwen3.5- abliterated:9b reliably choose to write a planning response instead of calling the tool — even with /no_think and a terse imperative system prompt. tool_choice=required is passed through to Ollama's chat API and removes the model's option to respond in text at all, forcing exactly one tool call per turn. Confirmed working with the abliterated Qwen 3.5 9B base. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:30:49 -05:00
William Gill	a1fca4d5d9	Image Studio: default base model → huihui_ai/qwen3.5-abliterated:9b Re-imports of image_studio.json kept reverting the base model back to mistral-nemo:12b because that was still hard-coded in the JSON. Updated the JSON, the markdown setup table, and the vision-capability section to lead with the Qwen 3.5 abliterated 9B preset. Re-ordered the markdown's vision section: shipped default first (Qwen 3.5 abliterated, with the /no_think + enable_thinking caveat called out explicitly), alternatives (qwen2.5vl:7b, llama3.2-vision, minicpm-v) second, non-vision fallback third. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:22:13 -05:00
William Gill	0b1c2ee5b5	Image Studio: tighten system prompt, add /no_think for Qwen 3.x User reported the model writing a multi-paragraph 'editing plan' instead of calling edit_image, only firing the tool when explicitly told to. Two underlying causes: 1. The previous system prompt was conversational ('ALWAYS / NEVER' lists with discussion) — Qwen-style models read that as topics to think about rather than rules to obey. Replaced with terse, imperative dispatcher framing: 'You do not respond in prose. Every user message MUST result in exactly one tool call.' 2. Qwen 3.x ships with thinking mode on by default. Reasoning models almost universally degrade native function calling — they plan how to use a tool instead of just calling it. Prepended /no_think (Qwen 3.x recognises this token and skips reasoning). No-op for non-Qwen-3 base models. Removed the long after-action paragraph that encouraged elaborate follow-ups; replaced with 'at most one short sentence'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:18:49 -05:00
William Gill	f77f5993fb	Image Studio: enable vision capability + document upgrade path Open WebUI was blocking image attachments to the Image Studio model because mistral-nemo:12b isn't vision-capable. Two changes: - capabilities.vision flipped to true in the preset JSON. The Tool only needs the image to make it through __messages__ / __files__ to call edit_image; the actual visual processing happens in ComfyUI's img2img, not in the LLM. Setting the flag unlocks the attach-image UI without lying about what mistral-nemo can do. - System prompt now tells the LLM explicitly: "you may not be able to visually inspect the attached image — that is fine. Trust the user's description and call edit_image." Prevents the LLM from refusing or hedging when it gets an image it can't see. Documented the upgrade path in image_studio.md for users who want real vision (qwen2.5vl:7b, llama3.2-vision:11b, minicpm-v:8b — pick one, add to init-models.sh, swap base_model_id in the preset). The vision LLM can then write smarter edit_image calls from the image content rather than the user's description alone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 13:31:17 -05:00
William Gill	6adf133558	Ship Image Studio as importable JSON in addition to markdown walkthrough Open WebUI accepts a JSON file at Workspace → Models → Import that seeds a new model preset in one click instead of the manual table- driven setup. The new image_studio.json mirrors the Open WebUI bulk- export schema (array wrapper around the model object with id, name, base_model_id, params, meta) and pre-fills system prompt, native function calling, temperature 0.5, top_p 0.9, smart_image_gen tool attachment, suggestion prompts. The markdown walkthrough stays as the source of truth for the system prompt content and as the fallback when import fails (e.g. tool ID mismatch, unfamiliar field, schema drift across Open WebUI versions). README points at both paths. Caveat doc'd in the markdown: if the imported preset doesn't actually have smart_image_gen attached, the tool ID in the JSON didn't match what Open WebUI assigned — re-attach manually in the model edit screen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 13:04:49 -05:00
William Gill	d4e2058859	smart_image_gen v0.3: add edit_image (img2img) method The Tool now exposes two methods the LLM picks between based on whether the user attached an image: generate_image — txt2img (existing, unchanged behavior) edit_image — img2img on the most recently attached image edit_image extracts the source image from __messages__ (base64 data URIs in image_url content blocks) or __files__ (local path or URL), uploads to ComfyUI's /upload/image, runs an img2img workflow at the caller-specified denoise (default 0.7), and returns the edited result. Same per-style routing / sampler / CFG / prefix logic as generation. Refactored the submit-and-poll loop into _submit_and_fetch shared by both methods. Image extraction is defensive — tries messages first, then files (path then URL), returns a clear "no image attached" message rather than silently generating from scratch. Image Studio system prompt rewritten to teach the LLM when to call edit_image vs generate_image and how to pick denoise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:59:13 -05:00
William Gill	41d571d8d1	Add Image Studio model preset — forces smart_image_gen tool use A documented Open WebUI custom-model preset wrapping mistral-nemo:12b with: aggressive system prompt that mandates calling generate_image, only the smart_image_gen tool attached, native function calling, lower temperature for tool-call reliability. Users pick "Image Studio" from the chat-model dropdown when they want images. Solves the common case where general-purpose chat models describe an image in text instead of firing the tool — usually on conversational phrasings like "can you draw me…". The preset removes the ambiguity by giving the LLM exactly one job and one tool. Setup walkthrough in openwebui-models/image_studio.md; deployment README §9 points users at it as the recommended path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:54:13 -05:00

18 Commits