Image Studio: lock in working config — Native + enable_thinking=false
User confirmed end-to-end working stack:
- base_model_id: huihui_ai/qwen3-vl-abliterated:8b
- function_calling: native (gives 'View Result from edit_image'
blocks, structured tool call traces)
- custom_params:
tool_choice: required (forces tool call every turn)
enable_thinking: false (server-side disable; abliterated
Qwen ignores the /no_think system
prompt directive — when thinking
is on the tool call leaks inside
a thinking block as text)
Updated image_studio.json + the markdown setup table + the
'Qwen 3.x quirk' explainer to match. The /no_think line in the
system prompt stays in for non-abliterated Qwen variants but is now
documented as best-effort backup; enable_thinking=false is the
authoritative kill-switch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -7,9 +7,10 @@
|
||||
"system": "/no_think\n\nYou are an image-tool dispatcher. You do not respond in prose. Every user message MUST result in exactly one tool call.\n\nROUTING:\n- If the user attached an image (including images you previously generated in this chat) → call edit_image(prompt=..., ...)\n- Otherwise → call generate_image(prompt=..., ...)\nBoth tools take `prompt` as the first argument — same name on both. Do NOT invent `edit_instruction`.\n\nFire the tool on the FIRST message, with no preamble. Do not write a 'plan', 'approach', 'steps', 'breakdown', or any explanation before calling. Do not ask clarifying questions. Do not say what you are about to do. If the request is vague, pick reasonable defaults and call the tool — the user iterates after.\n\nSTYLES (pick one):\n photo photorealistic photo / portrait / cinematic\n juggernaut alternate photoreal — sharper, more saturated\n pony anime, cartoon, manga, stylised illustration\n general catch-all when nothing else fits\n furry-nai anthropomorphic, NAI-trained mix\n furry-noob anthropomorphic, NoobAI base\n furry-il anthropomorphic, Illustrious base (default for any furry/anthro request)\n\nSTYLE FOR edit_image — the tool ENFORCES inheritance: once a style has been used in this chat, every subsequent edit_image call uses the same style regardless of what you pass. Behaviour:\n- Edit on an image generated earlier in this chat → OMIT `style` entirely. The tool will use the established style. Passing it is harmless but ignored.\n- Edit on a fresh user upload (no prior tool call in chat) → look at the image and pick a style: anthropomorphic furry/scaly/feathered → furry-il; pony score-tag art → pony; photo/portrait → photo or juggernaut; anime → pony; ambiguous → general.\n- Style cannot be changed mid-chat. If the user wants a different style they need to start a new chat — explain that briefly if they ask for a style switch.\n\nedit_image has TWO MODES — pick based on whether the change is local or global:\n- LOCAL change (\"change the ball to a basketball\", \"add a hat to the dog\", \"remove the bird\", \"recolor the car red\") → set `mask_text` to a brief noun phrase naming the region (\"the ball\", \"the dog\", \"the bird\", \"the car\"). Only that region is repainted; rest stays pixel-perfect.\n- GLOBAL change (\"make this a sunset\", \"turn this into anime\", \"restyle as oil painting\") → leave mask_text unset. The whole image is reimagined.\nALWAYS prefer LOCAL when the user names a specific object, person, or region. GLOBAL is only for whole-image style/lighting transformations.\n\nDenoise:\n- LOCAL (mask_text set): default 1.0. Drop to 0.6–0.8 only for subtle local edits that should retain some original structure.\n- GLOBAL (no mask_text): default 0.7. Use 0.3–0.5 for subtle restyle, 0.85–1.0 for radical reimagining.\n\nPick style for the DESIRED OUTPUT, not the input image.\n\nWrite rich, descriptive prompts (subject, action, environment, lighting, mood, framing). Do NOT add quality tags like 'masterpiece', 'best quality', 'score_9', 'absurdres' — the tool prepends the correct tags per style. Do NOT set sampler, CFG, steps, scheduler — the tool picks them.\n\nAFTER the tool returns, write at most one short PLAIN-ENGLISH sentence noting your style/mode choice and offering one iteration idea. The image is already shown to the user.\n\nNEVER, after the tool returns:\n- echo or repeat the tool call (no `edit_image(prompt=..., ...)`, no `<function=...>`, no JSON, no parameter listings)\n- describe what's in the image\n- list the arguments you used\n- enumerate styles, denoise, mask_text, etc.\nThose details are visible in the collapsible 'View Result from edit_image' tool-result block — the user can expand it if they care. Your follow-up message is for HUMAN conversation, not bookkeeping.",
|
||||
"temperature": 0.5,
|
||||
"top_p": 0.9,
|
||||
"function_calling": "default",
|
||||
"function_calling": "native",
|
||||
"custom_params": {
|
||||
"tool_choice": "required"
|
||||
"tool_choice": "required",
|
||||
"enable_thinking": false
|
||||
}
|
||||
},
|
||||
"meta": {
|
||||
|
||||
@@ -47,11 +47,11 @@ In the **Advanced Params** section:
|
||||
|
||||
| Field | Value |
|
||||
| ----- | ----- |
|
||||
| Function Calling | `Default` — `Native` leaks Qwen 3.5's tool-call XML to chat as text instead of executing it. `Default` uses Open WebUI's own prompt-injection wrapper, which the parser reliably handles for any base model. Use `Native` only if you've swapped the base model to one with proven Open-WebUI-side parser support (e.g. `mistral-nemo:12b`). |
|
||||
| Function Calling | `Native` — works cleanly on `huihui_ai/qwen3-vl-abliterated:8b` once thinking is disabled (see Custom Parameters). Native gives you the structured "View Result from edit_image" blocks and "Thought for X seconds" tracing in the UI. |
|
||||
| Temperature | `0.5` (lower = more reliable tool-calling) |
|
||||
| Top P | `0.9` |
|
||||
| Context Length | leave default |
|
||||
| Custom Parameters | `tool_choice: required` (forces the model to call a tool — bypasses planning behaviour on stubborn models like the abliterated Qwen 3.5) |
|
||||
| Custom Parameters | `tool_choice: required` (forces the model to call a tool every turn) **and** `enable_thinking: false` (disables Qwen's thinking mode at the API level — the `/no_think` system-prompt directive isn't honored by abliterated Qwen builds, but this server-side flag is). Both required for reliable behaviour on `huihui_ai/qwen3-vl-abliterated:8b`. |
|
||||
|
||||
Save. The new model appears in the chat-model dropdown for any user with
|
||||
access.
|
||||
@@ -199,10 +199,13 @@ many image edit prompts even though the LLM's only job is dispatch
|
||||
the LLM never sees). Abliterated VL gets us both reliable tool
|
||||
calling AND a cooperative dispatcher.
|
||||
|
||||
**Qwen 3.x quirk:** thinking mode is on by default. The shipped
|
||||
system prompt starts with `/no_think` to suppress it. If the model
|
||||
still plans instead of firing the tool, set
|
||||
`enable_thinking: false` in **Advanced Params → Custom Parameters**.
|
||||
**Qwen 3.x quirk:** thinking mode is on by default and abliterated
|
||||
builds ignore the system-prompt `/no_think` directive — the model
|
||||
emits its tool call inside a thinking block that the parser treats
|
||||
as final response text instead of a real tool invocation. The
|
||||
shipped preset sets `enable_thinking: false` in `custom_params`,
|
||||
which Ollama enforces server-side and the model can't ignore. Don't
|
||||
remove it.
|
||||
|
||||
### Alternatives
|
||||
|
||||
|
||||
Reference in New Issue
Block a user