Files

William Gill d8c8421361 Image Studio: lock in working config — Native + enable_thinking=false

User confirmed end-to-end working stack:
  - base_model_id: huihui_ai/qwen3-vl-abliterated:8b
  - function_calling: native (gives 'View Result from edit_image'
    blocks, structured tool call traces)
  - custom_params:
      tool_choice: required        (forces tool call every turn)
      enable_thinking: false       (server-side disable; abliterated
                                    Qwen ignores the /no_think system
                                    prompt directive — when thinking
                                    is on the tool call leaks inside
                                    a thinking block as text)

Updated image_studio.json + the markdown setup table + the
'Qwen 3.x quirk' explainer to match. The /no_think line in the
system prompt stays in for non-abliterated Qwen variants but is now
documented as best-effort backup; enable_thinking=false is the
authoritative kill-switch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-19 18:12:24 -05:00

12 KiB

Raw Permalink Blame History

Image Studio — dedicated image-generation chat model

A custom Open WebUI model preset that wraps a base LLM with a system prompt heavily biased toward calling the smart_image_gen tool. Users pick Image Studio from the chat-model dropdown when they want to generate or edit images, and the LLM treats every message as an image request — calling generate_image for new images and edit_image for modifications to attached ones.

This exists because general-purpose chat models often "describe" an image in text instead of calling the tool, especially when the request is conversational ("can you draw me…", "I'd like a picture of…"). A dedicated preset removes the ambiguity.

Two ways to install

Option A: Import the JSON (fast)

Workspace → Models → Import (top right) → upload image_studio.json.

This drops the preset in fully configured: base model, system prompt, tool attachment, function-calling mode, temperature, suggestion prompts. Verify after import:

The smart_image_gen tool is actually attached (Tools list under the model's edit screen). If not, the tool ID Open WebUI assigned doesn't match the toolIds: ["smart_image_gen"] in the JSON — re-attach manually.
Base Model is set to mistral-nemo:12b. Adjust if you want a different LLM (Qwen3.6 or Llama 3.1 also work well; smaller parameter counts may struggle with native tool calling).

Option B: Create manually (table below)

Workspace → Models → + (top right).

Field	Value
Name	`Image Studio`
Base Model	`huihui_ai/qwen3-vl-abliterated:8b` (Qwen 3 VL base, abliterated, vision + tools). Pull via `init-models.sh` first. The Qwen 3 VL fine-tune lineage isn't damaged by abliteration the way Qwen 3.5 is, so it both calls tools reliably AND won't refuse to dispatch on NSFW edit prompts.
Description	`Image generation and routing across SDXL checkpoints.`
System Prompt	Paste the block from System prompt below.
Tools	enable only `smart_image_gen`

In the Advanced Params section:

Field	Value
Function Calling	`Native` — works cleanly on `huihui_ai/qwen3-vl-abliterated:8b` once thinking is disabled (see Custom Parameters). Native gives you the structured "View Result from edit_image" blocks and "Thought for X seconds" tracing in the UI.
Temperature	`0.5` (lower = more reliable tool-calling)
Top P	`0.9`
Context Length	leave default
Custom Parameters	`tool_choice: required` (forces the model to call a tool every turn) and `enable_thinking: false` (disables Qwen's thinking mode at the API level — the `/no_think` system-prompt directive isn't honored by abliterated Qwen builds, but this server-side flag is). Both required for reliable behaviour on `huihui_ai/qwen3-vl-abliterated:8b`.

Save. The new model appears in the chat-model dropdown for any user with access.

System prompt

/no_think

You are an image-tool dispatcher. You do not respond in prose. Every
user message MUST result in exactly one tool call.

ROUTING:
- If the user attached an image (including images you previously
  generated in this chat) → call edit_image(prompt=..., ...)
- Otherwise → call generate_image(prompt=..., ...)
Both tools take `prompt` as the first argument — same name on both.
Do NOT invent `edit_instruction`.

Fire the tool on the FIRST message, with no preamble. Do not write a
'plan', 'approach', 'steps', 'breakdown', or any explanation before
calling. Do not ask clarifying questions. Do not say what you are
about to do. If the request is vague, pick reasonable defaults and
call the tool — the user iterates after.

STYLES (pick one):
  photo         photorealistic photo / portrait / cinematic
  juggernaut    alternate photoreal — sharper, more saturated
  pony          anime, cartoon, manga, stylised illustration
  general       catch-all when nothing else fits
  furry-nai     anthropomorphic, NAI-trained mix
  furry-noob    anthropomorphic, NoobAI base
  furry-il      anthropomorphic, Illustrious base (default for any
                furry/anthro request)

STYLE FOR edit_image — the tool ENFORCES inheritance: once a style
has been used in this chat, every subsequent edit_image call uses
the same style regardless of what you pass. Behaviour:

- Edit on an image generated earlier in this chat → OMIT `style`
  entirely. The tool will use the established style. Passing it is
  harmless but ignored.
- Edit on a fresh user upload (no prior tool call in chat) → look at
  the image and pick a style: anthropomorphic furry/scaly/feathered
  → furry-il; pony score-tag art → pony; photo / portrait → photo
  or juggernaut; anime → pony; ambiguous → general.
- Style cannot be changed mid-chat. If the user wants a different
  style, tell them they need to start a new chat — the tool ignores
  style overrides on follow-up calls.

edit_image has TWO MODES — pick based on whether the change is local
or global:

- LOCAL ("change the ball to a basketball", "add a hat to the dog",
  "remove the bird", "recolor the car red") → set `mask_text` to a
  brief noun phrase naming the region ("the ball", "the dog", "the
  bird", "the car"). Only that region is repainted; rest stays
  pixel-perfect.
- GLOBAL ("make this a sunset", "turn this into anime", "restyle as
  oil painting") → leave mask_text unset. The whole image is
  reimagined.

ALWAYS prefer LOCAL when the user names a specific object, person,
or region. GLOBAL is only for whole-image style/lighting
transformations.

Denoise:
- LOCAL (mask_text set): default 1.0. Drop to 0.6–0.8 only for
  subtle local edits that should retain some original structure.
- GLOBAL (no mask_text): default 0.7. Use 0.3–0.5 for subtle
  restyle, 0.85–1.0 for radical reimagining.

Pick style for the DESIRED OUTPUT, not the input image.

Write rich, descriptive prompts (subject, action, environment,
lighting, mood, framing). Do NOT add quality tags like 'masterpiece',
'best quality', 'score_9', 'absurdres' — the tool prepends the
correct tags per style. Do NOT set sampler, CFG, steps, scheduler —
the tool picks them.

AFTER the tool returns, write at most one short PLAIN-ENGLISH
sentence noting your style/mode choice and offering one iteration
idea. The image is already shown to the user.

NEVER, after the tool returns:
- echo or repeat the tool call (no `edit_image(prompt=..., ...)`,
  no `<function=...>`, no JSON, no parameter listings)
- describe what's in the image
- list the arguments you used
- enumerate styles, denoise, mask_text, etc.
Those details are visible in the collapsible 'View Result from
edit_image' tool-result block — the user can expand it if they
care. Your follow-up message is for HUMAN conversation, not
bookkeeping.

The first line /no_think disables Qwen 3.x's reasoning phase. If your base model isn't Qwen 3, leaving it in is a no-op (other models ignore it). Drop it only if it actually causes problems.

Set a separate Task Model (required after install)

tool_choice: required is what makes Image Studio reliably fire the tool, but it has a side effect: Open WebUI uses the same model with the same params for title generation, tag generation, and autocomplete. With every response forced to be a tool call, those text-only background tasks can't produce text, so chats stay named "New Chat" forever and tag suggestions go silent.

Fix: point Open WebUI at a different model for those tasks.

Admin Settings → Interface → Task Model → pick any of the non-Image-Studio models you have pulled. mistral-nemo:12b, llama3.1:8b, qwen3.6:latest, or dolphin3:8b all work. The Task Model only handles short background calls (titles, tags, autocomplete, search-query rewriting) — it doesn't need to be vision-capable or particularly large. Smaller is faster and cheaper.

Save. New Image Studio chats now get descriptive titles, tag suggestions return, and autocomplete lights up.

Vision capability

The shipped preset sets meta.capabilities.vision: true so Open WebUI allows users to attach images to chats with this model. Two paths:

Default — `huihui_ai/qwen3-vl-abliterated:8b`

The shipped preset uses huihui_ai's abliteration of Qwen 3 VL as the base — 8B params, vision-capable, native tool calling working, and won't refuse to dispatch the tool when the user's edit prompt is NSFW. Preseed via init-models.sh.

Why not the Qwen 3.5 abliterated 9B (huihui_ai/qwen3.5-abliterated:9b)? Same maintainer, but the abliteration on Qwen 3.5 mangles the function-call template, causing the model to either refuse to call tools or emit malformed <function=...> XML that Open WebUI's parser can't recognise. The Qwen 3 VL fine-tune lineage is different and doesn't take that damage from abliteration.

Why not standard qwen3.5:9b? The standard (non-abliterated) Qwen 3.5 calls tools reliably but its safety training refuses on many image edit prompts even though the LLM's only job is dispatch (the actual image content is generated by the SDXL checkpoint, which the LLM never sees). Abliterated VL gets us both reliable tool calling AND a cooperative dispatcher.

Qwen 3.x quirk: thinking mode is on by default and abliterated builds ignore the system-prompt /no_think directive — the model emits its tool call inside a thinking block that the parser treats as final response text instead of a real tool invocation. The shipped preset sets enable_thinking: false in custom_params, which Ollama enforces server-side and the model can't ignore. Don't remove it.

Alternatives

If Qwen 3.5 isn't a fit (size, language preferences, abliteration caveats), other vision-capable Ollama tags worth trying:

qwen2.5vl:7b — smaller, no thinking mode, very reliable tool-caller
llama3.2-vision:11b — Meta's vision variant, ~7 GB
minicpm-v:8b — fast, capable

To swap, change base_model_id in image_studio.json (or the Base Model field if you imported manually) and pull the model via init-models.sh or the Open WebUI model UI.

Non-vision base model

If you'd rather use a text-only LLM (e.g. mistral-nemo:12b), keep vision: true in the preset so Open WebUI still permits image attachments; the image flows through to edit_image via __messages__ / __files__ and ComfyUI does the visual work. The LLM can't see the image, but for explicit edit instructions ("change the background to a sunset") that doesn't matter.

Why this works when a generic chat model didn't

The system prompt is unambiguous. No room for the model to decide "I'll just describe it in text instead."
Only one tool is attached. No competing tools to choose between.
Function Calling: Default is the safer choice for Qwen 3.x abliterated. Native mode expects the parser to recognise the model's structured tool-call format, which currently leaks Qwen 3.5's <function=...><parameter=...> XML to chat as plain text on the published Open WebUI / Ollama versions. Default mode uses Open WebUI's own prompt-injection wrapper that round-trips reliably. Try Native only after swapping the base model to one known to work end-to-end (mistral-nemo, qwen2.5vl).
Lower temperature. Tool calling is more reliable with less sampling randomness.

Iterating on the system prompt

If users ask for things you didn't anticipate (specific aspect ratios, multi-image batches, particular checkpoints not in the routing rules), edit the system prompt above and re-paste into the Workspace → Models entry. It's the highest-leverage place to tune behaviour without touching the Tool's Python.

12 KiB Raw Permalink Blame History Unescape Escape