User confirmed end-to-end working stack:
- base_model_id: huihui_ai/qwen3-vl-abliterated:8b
- function_calling: native (gives 'View Result from edit_image'
blocks, structured tool call traces)
- custom_params:
tool_choice: required (forces tool call every turn)
enable_thinking: false (server-side disable; abliterated
Qwen ignores the /no_think system
prompt directive — when thinking
is on the tool call leaks inside
a thinking block as text)
Updated image_studio.json + the markdown setup table + the
'Qwen 3.x quirk' explainer to match. The /no_think line in the
system prompt stays in for non-abliterated Qwen variants but is now
documented as best-effort backup; enable_thinking=false is the
authoritative kill-switch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User confirmed this model works end-to-end after the multi-base-model
search. Settled on it because Qwen 3 VL's fine-tune lineage isn't
damaged by abliteration the way Qwen 3.5's is, so it both calls tools
reliably AND won't refuse to dispatch on NSFW edit prompts.
Updated:
- image_studio.json base_model_id → huihui_ai/qwen3-vl-abliterated:8b
- init-models.sh: pulls the abliterated VL model in place of the
non-working standard qwen3.5:9b
- image_studio.md: setup table base-model row + vision-section
'why this and not the alternatives' explanation
function_calling stays default and tool_choice required. Operator
can flip to native + drop tool_choice once they've verified the new
base behaves with structured tool calls (which would also remove the
need for a separate Task Model for title generation).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vision-capable LLMs misclassify rendered subjects when picking a
style — observed: model called juggernaut for an edit on a furry-il
generation because the rendered character looked 'photoreal-ish' to
its vision encoder. Each visual judgment is independent so styles
flip mid-chat.
Flipped resolution order in edit_image so inheritance from the prior
generate_image / edit_image call DOMINATES the LLM's explicit style
arg. The LLM's choice only wins when there's nothing to inherit
(first edit in a chat, fresh user upload). Workaround for legitimate
style changes is starting a new chat.
System prompt updated to match: tells the LLM that style inheritance
is enforced, that passing style on follow-up calls is ignored, and
that user requests for style change require a new chat.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The abliterated 9B was the source of the tool-call format mangling
(both Native XML leaks and Default Python-syntax leaks). Standard
qwen3.5:9b is the same family, same 9B size (6.6 GB), vision-capable
and native tool calling actually works.
The image content uncensored-ness was always going to come from the
SDXL checkpoints in ComfyUI — the LLM is just a dispatcher. Picking
a well-behaved tool-caller for that role doesn't compromise output
content.
Updated:
- image_studio.json base_model_id → qwen3.5:9b
- init-models.sh: pulls qwen3.5:9b as a standard registry pull,
in addition to the existing abliterated 9B (which stays for
other chat models)
- image_studio.md setup table + vision section explaining why
we chose standard over abliterated for the dispatcher role
function_calling stays as 'default' and tool_choice as 'required'
for now — they don't hurt with a reliable tool-caller and operators
can flip back to native + drop tool_choice once they verify it
works for them (which also removes the need for a separate Task
Model for title generation).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User saw the LLM's chat response include a literal
'edit_image(prompt="...", mask_text="...", style="furry-il",
denoise=0.85)' line after the image rendered — Default function-
calling mode tends to make the model 'narrate' its tool call by
re-typing it as Python-style syntax.
Added an explicit NEVER block: no echoing the call, no JSON, no
listing arguments, no enumerating styles/denoise/mask_text. The same
info is in the collapsible 'View Result from edit_image' block that
Open WebUI renders alongside the message — there's no need for the
LLM to also paste it as prose. Follow-up text is for human
conversation, not bookkeeping.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Qwen 3.5 abliterated emits its native tool-call format
(<function=...><parameter=...>) wrapped in <tool_call> tags that the
current Open WebUI / Ollama parser does not reliably round-trip — the
XML leaks to chat as plain text instead of executing. Switching the
preset to Function Calling: Default, which uses Open WebUI's own
prompt-injection wrapper, fires the tool reliably.
Native is documented as the right choice only when the operator has
swapped the base model to one with proven OWUI-side parser support
(mistral-nemo:12b, qwen2.5vl:7b). For the shipped Qwen 3.5
abliterated default, Default is the working setting.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tool_choice: required (the thing that makes Image Studio reliably fire
its tools) also blocks Open WebUI's background text-only calls — title
generation, tag suggestions, autocomplete — because the model is
forced to produce a tool call instead of text. Result: chats stay
named 'New Chat' and tag suggestions go silent.
Documented the fix in two places:
- image_studio.md: dedicated 'Set a separate Task Model (required
after install)' section explaining the cause and the fix path.
- deployment README §9: short follow-up note pointing at it so
operators don't miss it during initial setup.
The fix is purely Open WebUI configuration — no code change. Pick any
non-Image-Studio model already pulled (mistral-nemo:12b is the
obvious default) for the Task Model slot.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User reported edit_image picking 'juggernaut' (photoreal) for an edit
on a furry image — the LLM didn't carry context, and the tool's
fallback _route_style only sees the edit instruction text, which for
neutral edits ('bigger', 'glowing eyes') has no furry keywords.
Fix in two places:
1. Tool: _inherited_style scans __messages__ in reverse for prior
generate_image / edit_image tool calls and returns the style arg
they used. edit_image now resolves: explicit style → inherited →
keyword fallback. Deterministic, no LLM cooperation needed for
follow-up edits on previously-generated images.
2. System prompt: explicit three-step style resolution for
edit_image. Generated by you → omit style and auto-inherit.
Uploaded by user → INSPECT visually and pick a matching style
(the LLM is the only thing with vision; the tool can't see
pixels). Then keep that style for subsequent edits.
Both paths matter — the tool fix handles the common case
deterministically, the prompt fix handles the upload case where
there's nothing to inherit from.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs in one screenshot:
1. LLM called edit_image(prompt=..., ...) but the signature was
edit_image(edit_instruction=..., ...) — mismatch, missing-arg
crash. Renamed the first param to `prompt` so both tools have a
matching, predictable name. System prompt updated with an explicit
'do not invent edit_instruction' line for stubborn models.
2. After fix#1, edit_image still couldn't find the prior generated
image because Open WebUI assistant-message file attachments only
carry {type, url} (no id, no path). _read_file_dict now also
greps the file id out of /api/v1/files/<uuid>/content URLs and
feeds it to Files.get_file_by_id. Verified pattern matches
absolute URLs (https://llm-1.srvno.de/api/v1/files/.../content).
System prompt also now says 'including images you previously
generated in this chat' to nudge the LLM to pick up assistant
outputs as edit candidates.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous commit put tool_choice at the top level of params. Open WebUI
drops that silently — apply_model_params_to_body has a whitelist of
mapped param names (temperature, top_p, etc.) and tool_choice isn't
on it. The Custom Parameters UI section also only iterates
params.custom_params, which is why the value didn't appear there
after importing the preset.
Correct location is the custom_params sub-dict, where values go
through json.loads before being merged into the outgoing chat
completion body. 'required' stays a string after the failed
json.loads and ends up exactly where the OpenAI / Ollama tools spec
expects it.
Source: src/lib/components/chat/Settings/Advanced/AdvancedParams.svelte
(UI binding) and backend/open_webui/utils/payload.py (serialization).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five pieces:
1. Dockerfile installs storyicon/comfyui_segment_anything (GroundingDINO
+ SAM-HQ in one bundle) into custom_nodes and pip-installs its
requirements at build time. Model weights auto-download to the
comfyui-models volume on first inpaint (~3 GB one-time cost).
2. install-custom-node-deps.sh — entrypoint wrapper that pip-installs
requirements.txt for any custom_node present at startup. Lets users
add custom nodes via ComfyUI-Manager (or by git-cloning into the
volume) and have the deps picked up on the next restart, without
editing the Dockerfile.
3. smart_image_gen v0.6: edit_image gains a `mask_text` param. When
set, builds an inpainting workflow (LoadImage → GroundingDinoSAM
Segment → SetLatentNoiseMask → KSampler) so only the named region
is repainted. When unset, falls through to the existing img2img
path. Denoise default switches: 1.0 with mask_text (full repaint
within mask), 0.7 without.
4. Image Studio system prompt teaches the LLM the LOCAL vs GLOBAL
distinction — set mask_text whenever the user names a specific
object/region ('the ball', 'the dog', 'the sky'); leave it unset
only for whole-image style/lighting transformations.
5. Deployment README documents the new mode + the first-inpaint
weight-download caveat.
Image rebuild required — bump tag to pick up the Dockerfile change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without it, abliterated/reasoning models like huihui_ai/qwen3.5-
abliterated:9b reliably choose to write a planning response instead
of calling the tool — even with /no_think and a terse imperative
system prompt. tool_choice=required is passed through to Ollama's
chat API and removes the model's option to respond in text at all,
forcing exactly one tool call per turn.
Confirmed working with the abliterated Qwen 3.5 9B base.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-imports of image_studio.json kept reverting the base model back to
mistral-nemo:12b because that was still hard-coded in the JSON.
Updated the JSON, the markdown setup table, and the vision-capability
section to lead with the Qwen 3.5 abliterated 9B preset.
Re-ordered the markdown's vision section: shipped default first
(Qwen 3.5 abliterated, with the /no_think + enable_thinking caveat
called out explicitly), alternatives (qwen2.5vl:7b, llama3.2-vision,
minicpm-v) second, non-vision fallback third.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User reported the model writing a multi-paragraph 'editing plan'
instead of calling edit_image, only firing the tool when explicitly
told to. Two underlying causes:
1. The previous system prompt was conversational ('ALWAYS / NEVER'
lists with discussion) — Qwen-style models read that as topics
to think about rather than rules to obey. Replaced with terse,
imperative dispatcher framing: 'You do not respond in prose.
Every user message MUST result in exactly one tool call.'
2. Qwen 3.x ships with thinking mode on by default. Reasoning
models almost universally degrade native function calling — they
plan how to use a tool instead of just calling it. Prepended
/no_think (Qwen 3.x recognises this token and skips reasoning).
No-op for non-Qwen-3 base models.
Removed the long after-action paragraph that encouraged elaborate
follow-ups; replaced with 'at most one short sentence'.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Open WebUI was blocking image attachments to the Image Studio model
because mistral-nemo:12b isn't vision-capable. Two changes:
- capabilities.vision flipped to true in the preset JSON. The Tool
only needs the image to make it through __messages__ / __files__
to call edit_image; the actual visual processing happens in
ComfyUI's img2img, not in the LLM. Setting the flag unlocks the
attach-image UI without lying about what mistral-nemo can do.
- System prompt now tells the LLM explicitly: "you may not be able
to visually inspect the attached image — that is fine. Trust the
user's description and call edit_image." Prevents the LLM from
refusing or hedging when it gets an image it can't see.
Documented the upgrade path in image_studio.md for users who want
real vision (qwen2.5vl:7b, llama3.2-vision:11b, minicpm-v:8b — pick
one, add to init-models.sh, swap base_model_id in the preset). The
vision LLM can then write smarter edit_image calls from the image
content rather than the user's description alone.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Open WebUI accepts a JSON file at Workspace → Models → Import that
seeds a new model preset in one click instead of the manual table-
driven setup. The new image_studio.json mirrors the Open WebUI bulk-
export schema (array wrapper around the model object with id, name,
base_model_id, params, meta) and pre-fills system prompt, native
function calling, temperature 0.5, top_p 0.9, smart_image_gen tool
attachment, suggestion prompts.
The markdown walkthrough stays as the source of truth for the system
prompt content and as the fallback when import fails (e.g. tool ID
mismatch, unfamiliar field, schema drift across Open WebUI versions).
README points at both paths.
Caveat doc'd in the markdown: if the imported preset doesn't actually
have smart_image_gen attached, the tool ID in the JSON didn't match
what Open WebUI assigned — re-attach manually in the model edit
screen.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Tool now exposes two methods the LLM picks between based on whether
the user attached an image:
generate_image — txt2img (existing, unchanged behavior)
edit_image — img2img on the most recently attached image
edit_image extracts the source image from __messages__ (base64 data
URIs in image_url content blocks) or __files__ (local path or URL),
uploads to ComfyUI's /upload/image, runs an img2img workflow at the
caller-specified denoise (default 0.7), and returns the edited result.
Same per-style routing / sampler / CFG / prefix logic as generation.
Refactored the submit-and-poll loop into _submit_and_fetch shared by
both methods. Image extraction is defensive — tries messages first,
then files (path then URL), returns a clear "no image attached"
message rather than silently generating from scratch.
Image Studio system prompt rewritten to teach the LLM when to call
edit_image vs generate_image and how to pick denoise.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A documented Open WebUI custom-model preset wrapping mistral-nemo:12b
with: aggressive system prompt that mandates calling generate_image,
only the smart_image_gen tool attached, native function calling,
lower temperature for tool-call reliability. Users pick "Image Studio"
from the chat-model dropdown when they want images.
Solves the common case where general-purpose chat models describe an
image in text instead of firing the tool — usually on conversational
phrasings like "can you draw me…". The preset removes the ambiguity
by giving the LLM exactly one job and one tool.
Setup walkthrough in openwebui-models/image_studio.md; deployment
README §9 points users at it as the recommended path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>