18 Commits

Author SHA1 Message Date
d8c8421361 Image Studio: lock in working config — Native + enable_thinking=false
User confirmed end-to-end working stack:
  - base_model_id: huihui_ai/qwen3-vl-abliterated:8b
  - function_calling: native (gives 'View Result from edit_image'
    blocks, structured tool call traces)
  - custom_params:
      tool_choice: required        (forces tool call every turn)
      enable_thinking: false       (server-side disable; abliterated
                                    Qwen ignores the /no_think system
                                    prompt directive — when thinking
                                    is on the tool call leaks inside
                                    a thinking block as text)

Updated image_studio.json + the markdown setup table + the
'Qwen 3.x quirk' explainer to match. The /no_think line in the
system prompt stays in for non-abliterated Qwen variants but is now
documented as best-effort backup; enable_thinking=false is the
authoritative kill-switch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 18:12:24 -05:00
f5a222fe6f Image Studio: default base model → huihui_ai/qwen3-vl-abliterated:8b
User confirmed this model works end-to-end after the multi-base-model
search. Settled on it because Qwen 3 VL's fine-tune lineage isn't
damaged by abliteration the way Qwen 3.5's is, so it both calls tools
reliably AND won't refuse to dispatch on NSFW edit prompts.

Updated:
  - image_studio.json base_model_id → huihui_ai/qwen3-vl-abliterated:8b
  - init-models.sh: pulls the abliterated VL model in place of the
    non-working standard qwen3.5:9b
  - image_studio.md: setup table base-model row + vision-section
    'why this and not the alternatives' explanation

function_calling stays default and tool_choice required. Operator
can flip to native + drop tool_choice once they've verified the new
base behaves with structured tool calls (which would also remove the
need for a separate Task Model for title generation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:56:16 -05:00
ec6108888a smart_image_gen v0.7.7: enforce style inheritance for edit_image
Vision-capable LLMs misclassify rendered subjects when picking a
style — observed: model called juggernaut for an edit on a furry-il
generation because the rendered character looked 'photoreal-ish' to
its vision encoder. Each visual judgment is independent so styles
flip mid-chat.

Flipped resolution order in edit_image so inheritance from the prior
generate_image / edit_image call DOMINATES the LLM's explicit style
arg. The LLM's choice only wins when there's nothing to inherit
(first edit in a chat, fresh user upload). Workaround for legitimate
style changes is starting a new chat.

System prompt updated to match: tells the LLM that style inheritance
is enforced, that passing style on follow-up calls is ignored, and
that user requests for style change require a new chat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:53:32 -05:00
20d4bd5b72 Image Studio: switch base model to qwen3.5:9b (non-abliterated)
The abliterated 9B was the source of the tool-call format mangling
(both Native XML leaks and Default Python-syntax leaks). Standard
qwen3.5:9b is the same family, same 9B size (6.6 GB), vision-capable
and native tool calling actually works.

The image content uncensored-ness was always going to come from the
SDXL checkpoints in ComfyUI — the LLM is just a dispatcher. Picking
a well-behaved tool-caller for that role doesn't compromise output
content.

Updated:
  - image_studio.json base_model_id → qwen3.5:9b
  - init-models.sh: pulls qwen3.5:9b as a standard registry pull,
    in addition to the existing abliterated 9B (which stays for
    other chat models)
  - image_studio.md setup table + vision section explaining why
    we chose standard over abliterated for the dispatcher role

function_calling stays as 'default' and tool_choice as 'required'
for now — they don't hurt with a reliable tool-caller and operators
can flip back to native + drop tool_choice once they verify it
works for them (which also removes the need for a separate Task
Model for title generation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:09:42 -05:00
011fade024 Image Studio prompt: forbid post-tool echo of the function call
User saw the LLM's chat response include a literal
'edit_image(prompt="...", mask_text="...", style="furry-il",
denoise=0.85)' line after the image rendered — Default function-
calling mode tends to make the model 'narrate' its tool call by
re-typing it as Python-style syntax.

Added an explicit NEVER block: no echoing the call, no JSON, no
listing arguments, no enumerating styles/denoise/mask_text. The same
info is in the collapsible 'View Result from edit_image' block that
Open WebUI renders alongside the message — there's no need for the
LLM to also paste it as prose. Follow-up text is for human
conversation, not bookkeeping.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:41:51 -05:00
1ed2e7293e Image Studio: ship function_calling=default — Native leaks Qwen 3.5 XML
Qwen 3.5 abliterated emits its native tool-call format
(<function=...><parameter=...>) wrapped in <tool_call> tags that the
current Open WebUI / Ollama parser does not reliably round-trip — the
XML leaks to chat as plain text instead of executing. Switching the
preset to Function Calling: Default, which uses Open WebUI's own
prompt-injection wrapper, fires the tool reliably.

Native is documented as the right choice only when the operator has
swapped the base model to one with proven OWUI-side parser support
(mistral-nemo:12b, qwen2.5vl:7b). For the shipped Qwen 3.5
abliterated default, Default is the working setting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:36:05 -05:00
e77666ea0f Image Studio docs: require setting a separate Task Model after install
tool_choice: required (the thing that makes Image Studio reliably fire
its tools) also blocks Open WebUI's background text-only calls — title
generation, tag suggestions, autocomplete — because the model is
forced to produce a tool call instead of text. Result: chats stay
named 'New Chat' and tag suggestions go silent.

Documented the fix in two places:
  - image_studio.md: dedicated 'Set a separate Task Model (required
    after install)' section explaining the cause and the fix path.
  - deployment README §9: short follow-up note pointing at it so
    operators don't miss it during initial setup.

The fix is purely Open WebUI configuration — no code change. Pick any
non-Image-Studio model already pulled (mistral-nemo:12b is the
obvious default) for the Task Model slot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:37:21 -05:00
6700f6ce33 smart_image_gen v0.7.3: edit_image inherits style from prior tool call
User reported edit_image picking 'juggernaut' (photoreal) for an edit
on a furry image — the LLM didn't carry context, and the tool's
fallback _route_style only sees the edit instruction text, which for
neutral edits ('bigger', 'glowing eyes') has no furry keywords.

Fix in two places:

  1. Tool: _inherited_style scans __messages__ in reverse for prior
     generate_image / edit_image tool calls and returns the style arg
     they used. edit_image now resolves: explicit style → inherited →
     keyword fallback. Deterministic, no LLM cooperation needed for
     follow-up edits on previously-generated images.

  2. System prompt: explicit three-step style resolution for
     edit_image. Generated by you → omit style and auto-inherit.
     Uploaded by user → INSPECT visually and pick a matching style
     (the LLM is the only thing with vision; the tool can't see
     pixels). Then keep that style for subsequent edits.

Both paths matter — the tool fix handles the common case
deterministically, the prompt fix handles the upload case where
there's nothing to inherit from.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:31:57 -05:00
06433d3815 smart_image_gen v0.7.1: rename edit_image arg + parse file id from URL
Two bugs in one screenshot:

1. LLM called edit_image(prompt=..., ...) but the signature was
   edit_image(edit_instruction=..., ...) — mismatch, missing-arg
   crash. Renamed the first param to `prompt` so both tools have a
   matching, predictable name. System prompt updated with an explicit
   'do not invent edit_instruction' line for stubborn models.

2. After fix #1, edit_image still couldn't find the prior generated
   image because Open WebUI assistant-message file attachments only
   carry {type, url} (no id, no path). _read_file_dict now also
   greps the file id out of /api/v1/files/<uuid>/content URLs and
   feeds it to Files.get_file_by_id. Verified pattern matches
   absolute URLs (https://llm-1.srvno.de/api/v1/files/.../content).

System prompt also now says 'including images you previously
generated in this chat' to nudge the LLM to pick up assistant
outputs as edit candidates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:58:40 -05:00
780ce42711 Image Studio: move tool_choice into params.custom_params (correct field)
Previous commit put tool_choice at the top level of params. Open WebUI
drops that silently — apply_model_params_to_body has a whitelist of
mapped param names (temperature, top_p, etc.) and tool_choice isn't
on it. The Custom Parameters UI section also only iterates
params.custom_params, which is why the value didn't appear there
after importing the preset.

Correct location is the custom_params sub-dict, where values go
through json.loads before being merged into the outgoing chat
completion body. 'required' stays a string after the failed
json.loads and ends up exactly where the OpenAI / Ollama tools spec
expects it.

Source: src/lib/components/chat/Settings/Advanced/AdvancedParams.svelte
(UI binding) and backend/open_webui/utils/payload.py (serialization).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:50:34 -05:00
d935e24624 Add text-targeted inpainting via GroundingDINO+SAM (mask_text param)
All checks were successful
release / Build & Push Docker Image (push) Successful in 44s
Five pieces:

1. Dockerfile installs storyicon/comfyui_segment_anything (GroundingDINO
   + SAM-HQ in one bundle) into custom_nodes and pip-installs its
   requirements at build time. Model weights auto-download to the
   comfyui-models volume on first inpaint (~3 GB one-time cost).

2. install-custom-node-deps.sh — entrypoint wrapper that pip-installs
   requirements.txt for any custom_node present at startup. Lets users
   add custom nodes via ComfyUI-Manager (or by git-cloning into the
   volume) and have the deps picked up on the next restart, without
   editing the Dockerfile.

3. smart_image_gen v0.6: edit_image gains a `mask_text` param. When
   set, builds an inpainting workflow (LoadImage → GroundingDinoSAM
   Segment → SetLatentNoiseMask → KSampler) so only the named region
   is repainted. When unset, falls through to the existing img2img
   path. Denoise default switches: 1.0 with mask_text (full repaint
   within mask), 0.7 without.

4. Image Studio system prompt teaches the LLM the LOCAL vs GLOBAL
   distinction — set mask_text whenever the user names a specific
   object/region ('the ball', 'the dog', 'the sky'); leave it unset
   only for whole-image style/lighting transformations.

5. Deployment README documents the new mode + the first-inpaint
   weight-download caveat.

Image rebuild required — bump tag to pick up the Dockerfile change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:43:52 -05:00
7c7897818e Image Studio: bake tool_choice=required into the preset
Without it, abliterated/reasoning models like huihui_ai/qwen3.5-
abliterated:9b reliably choose to write a planning response instead
of calling the tool — even with /no_think and a terse imperative
system prompt. tool_choice=required is passed through to Ollama's
chat API and removes the model's option to respond in text at all,
forcing exactly one tool call per turn.

Confirmed working with the abliterated Qwen 3.5 9B base.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:30:49 -05:00
a1fca4d5d9 Image Studio: default base model → huihui_ai/qwen3.5-abliterated:9b
Re-imports of image_studio.json kept reverting the base model back to
mistral-nemo:12b because that was still hard-coded in the JSON.
Updated the JSON, the markdown setup table, and the vision-capability
section to lead with the Qwen 3.5 abliterated 9B preset.

Re-ordered the markdown's vision section: shipped default first
(Qwen 3.5 abliterated, with the /no_think + enable_thinking caveat
called out explicitly), alternatives (qwen2.5vl:7b, llama3.2-vision,
minicpm-v) second, non-vision fallback third.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:22:13 -05:00
0b1c2ee5b5 Image Studio: tighten system prompt, add /no_think for Qwen 3.x
User reported the model writing a multi-paragraph 'editing plan'
instead of calling edit_image, only firing the tool when explicitly
told to. Two underlying causes:

  1. The previous system prompt was conversational ('ALWAYS / NEVER'
     lists with discussion) — Qwen-style models read that as topics
     to think about rather than rules to obey. Replaced with terse,
     imperative dispatcher framing: 'You do not respond in prose.
     Every user message MUST result in exactly one tool call.'

  2. Qwen 3.x ships with thinking mode on by default. Reasoning
     models almost universally degrade native function calling — they
     plan how to use a tool instead of just calling it. Prepended
     /no_think (Qwen 3.x recognises this token and skips reasoning).
     No-op for non-Qwen-3 base models.

Removed the long after-action paragraph that encouraged elaborate
follow-ups; replaced with 'at most one short sentence'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:18:49 -05:00
f77f5993fb Image Studio: enable vision capability + document upgrade path
Open WebUI was blocking image attachments to the Image Studio model
because mistral-nemo:12b isn't vision-capable. Two changes:

  - capabilities.vision flipped to true in the preset JSON. The Tool
    only needs the image to make it through __messages__ / __files__
    to call edit_image; the actual visual processing happens in
    ComfyUI's img2img, not in the LLM. Setting the flag unlocks the
    attach-image UI without lying about what mistral-nemo can do.

  - System prompt now tells the LLM explicitly: "you may not be able
    to visually inspect the attached image — that is fine. Trust the
    user's description and call edit_image." Prevents the LLM from
    refusing or hedging when it gets an image it can't see.

Documented the upgrade path in image_studio.md for users who want
real vision (qwen2.5vl:7b, llama3.2-vision:11b, minicpm-v:8b — pick
one, add to init-models.sh, swap base_model_id in the preset). The
vision LLM can then write smarter edit_image calls from the image
content rather than the user's description alone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 13:31:17 -05:00
6adf133558 Ship Image Studio as importable JSON in addition to markdown walkthrough
Open WebUI accepts a JSON file at Workspace → Models → Import that
seeds a new model preset in one click instead of the manual table-
driven setup. The new image_studio.json mirrors the Open WebUI bulk-
export schema (array wrapper around the model object with id, name,
base_model_id, params, meta) and pre-fills system prompt, native
function calling, temperature 0.5, top_p 0.9, smart_image_gen tool
attachment, suggestion prompts.

The markdown walkthrough stays as the source of truth for the system
prompt content and as the fallback when import fails (e.g. tool ID
mismatch, unfamiliar field, schema drift across Open WebUI versions).
README points at both paths.

Caveat doc'd in the markdown: if the imported preset doesn't actually
have smart_image_gen attached, the tool ID in the JSON didn't match
what Open WebUI assigned — re-attach manually in the model edit
screen.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 13:04:49 -05:00
d4e2058859 smart_image_gen v0.3: add edit_image (img2img) method
The Tool now exposes two methods the LLM picks between based on whether
the user attached an image:

  generate_image — txt2img (existing, unchanged behavior)
  edit_image     — img2img on the most recently attached image

edit_image extracts the source image from __messages__ (base64 data
URIs in image_url content blocks) or __files__ (local path or URL),
uploads to ComfyUI's /upload/image, runs an img2img workflow at the
caller-specified denoise (default 0.7), and returns the edited result.
Same per-style routing / sampler / CFG / prefix logic as generation.

Refactored the submit-and-poll loop into _submit_and_fetch shared by
both methods. Image extraction is defensive — tries messages first,
then files (path then URL), returns a clear "no image attached"
message rather than silently generating from scratch.

Image Studio system prompt rewritten to teach the LLM when to call
edit_image vs generate_image and how to pick denoise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:59:13 -05:00
41d571d8d1 Add Image Studio model preset — forces smart_image_gen tool use
A documented Open WebUI custom-model preset wrapping mistral-nemo:12b
with: aggressive system prompt that mandates calling generate_image,
only the smart_image_gen tool attached, native function calling,
lower temperature for tool-call reliability. Users pick "Image Studio"
from the chat-model dropdown when they want images.

Solves the common case where general-purpose chat models describe an
image in text instead of firing the tool — usually on conversational
phrasings like "can you draw me…". The preset removes the ambiguity
by giving the LLM exactly one job and one tool.

Setup walkthrough in openwebui-models/image_studio.md; deployment
README §9 points users at it as the recommended path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:54:13 -05:00