transformers 5.0 removed BertModel.get_head_mask (it was on the legacy
4.x API). comfyui_segment_anything's GroundingDINO bertwarper.py still
calls bert_model.get_head_mask in __init__, so first inpaint crashes
with AttributeError. Pinned transformers>=4.40,<5 in two places:
- Dockerfile: applied AFTER the custom node's requirements.txt
install so it wins on a fresh image build.
- install-custom-node-deps.sh entrypoint: re-applied at every
container start so any future custom-node install (via
ComfyUI-Manager or volume clone) that pulls a newer transformers
transitively gets pinned back into the working range.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
If __messages__ doesn't include the assistant's prior file attachments
(which is what the screenshot is showing), the new fallback queries
the chat by id via Chats.get_chat_by_id and walks every persisted
message for files. Open WebUI's socket handler always upserts files
onto the assistant message via {'files': files} so this path is
authoritative.
The 'No image found' return now includes diagnostic counts —
__files__, __messages__, messages_with_files, chat_id_present,
openwebui_runtime — so subsequent failures actually show what the
tool saw instead of being opaque.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs in one screenshot:
1. LLM called edit_image(prompt=..., ...) but the signature was
edit_image(edit_instruction=..., ...) — mismatch, missing-arg
crash. Renamed the first param to `prompt` so both tools have a
matching, predictable name. System prompt updated with an explicit
'do not invent edit_instruction' line for stubborn models.
2. After fix#1, edit_image still couldn't find the prior generated
image because Open WebUI assistant-message file attachments only
carry {type, url} (no id, no path). _read_file_dict now also
greps the file id out of /api/v1/files/<uuid>/content URLs and
feeds it to Files.get_file_by_id. Verified pattern matches
absolute URLs (https://llm-1.srvno.de/api/v1/files/.../content).
System prompt also now says 'including images you previously
generated in this chat' to nudge the LLM to pick up assistant
outputs as edit candidates.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous commit put tool_choice at the top level of params. Open WebUI
drops that silently — apply_model_params_to_body has a whitelist of
mapped param names (temperature, top_p, etc.) and tool_choice isn't
on it. The Custom Parameters UI section also only iterates
params.custom_params, which is why the value didn't appear there
after importing the preset.
Correct location is the custom_params sub-dict, where values go
through json.loads before being merged into the outgoing chat
completion body. 'required' stays a string after the failed
json.loads and ends up exactly where the OpenAI / Ollama tools spec
expects it.
Source: src/lib/components/chat/Settings/Advanced/AdvancedParams.svelte
(UI binding) and backend/open_webui/utils/payload.py (serialization).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug: after generate_image surfaced an image via the files event, the
next edit_image call returned 'No image found in the chat'. The image
was attached to the assistant's message, but _extract_attached_image
only scanned the user's __files__ param and image_url content blocks
on user messages — it never looked at messages.files for any role.
Fix: rewrite extraction to scan messages[].files in reverse for ALL
roles, so an assistant-emitted image from a prior tool call is found
the same way as a user-attached upload. Use Open WebUI's internal
Files.get_file_by_id when the file dict has an id, so we get raw
bytes from disk without going through the auth-protected
/api/v1/files/{id}/content endpoint. Old path-key and URL-fetch
paths kept as fallbacks.
Refactored shared helpers _file_dict_is_image and _read_file_dict
out of the loop to keep the search logic readable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five pieces:
1. Dockerfile installs storyicon/comfyui_segment_anything (GroundingDINO
+ SAM-HQ in one bundle) into custom_nodes and pip-installs its
requirements at build time. Model weights auto-download to the
comfyui-models volume on first inpaint (~3 GB one-time cost).
2. install-custom-node-deps.sh — entrypoint wrapper that pip-installs
requirements.txt for any custom_node present at startup. Lets users
add custom nodes via ComfyUI-Manager (or by git-cloning into the
volume) and have the deps picked up on the next restart, without
editing the Dockerfile.
3. smart_image_gen v0.6: edit_image gains a `mask_text` param. When
set, builds an inpainting workflow (LoadImage → GroundingDinoSAM
Segment → SetLatentNoiseMask → KSampler) so only the named region
is repainted. When unset, falls through to the existing img2img
path. Denoise default switches: 1.0 with mask_text (full repaint
within mask), 0.7 without.
4. Image Studio system prompt teaches the LLM the LOCAL vs GLOBAL
distinction — set mask_text whenever the user names a specific
object/region ('the ball', 'the dog', 'the sky'); leave it unset
only for whole-image style/lighting transformations.
5. Deployment README documents the new mode + the first-inpaint
weight-download caveat.
Image rebuild required — bump tag to pick up the Dockerfile change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without it, abliterated/reasoning models like huihui_ai/qwen3.5-
abliterated:9b reliably choose to write a planning response instead
of calling the tool — even with /no_think and a terse imperative
system prompt. tool_choice=required is passed through to Ollama's
chat API and removes the model's option to respond in text at all,
forcing exactly one tool call per turn.
Confirmed working with the abliterated Qwen 3.5 9B base.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-imports of image_studio.json kept reverting the base model back to
mistral-nemo:12b because that was still hard-coded in the JSON.
Updated the JSON, the markdown setup table, and the vision-capability
section to lead with the Qwen 3.5 abliterated 9B preset.
Re-ordered the markdown's vision section: shipped default first
(Qwen 3.5 abliterated, with the /no_think + enable_thinking caveat
called out explicitly), alternatives (qwen2.5vl:7b, llama3.2-vision,
minicpm-v) second, non-vision fallback third.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User reported the model writing a multi-paragraph 'editing plan'
instead of calling edit_image, only firing the tool when explicitly
told to. Two underlying causes:
1. The previous system prompt was conversational ('ALWAYS / NEVER'
lists with discussion) — Qwen-style models read that as topics
to think about rather than rules to obey. Replaced with terse,
imperative dispatcher framing: 'You do not respond in prose.
Every user message MUST result in exactly one tool call.'
2. Qwen 3.x ships with thinking mode on by default. Reasoning
models almost universally degrade native function calling — they
plan how to use a tool instead of just calling it. Prepended
/no_think (Qwen 3.x recognises this token and skips reasoning).
No-op for non-Qwen-3 base models.
Removed the long after-action paragraph that encouraged elaborate
follow-ups; replaced with 'at most one short sentence'.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three pieces:
1. mirror-ollama-model.sh — run on any machine that has the model
pulled. Parses the manifest at
~/.ollama/models/manifests/registry.ollama.ai/<ns>/<name>/<tag>,
greps every sha256:* digest, tars manifest + referenced blobs into
one .tgz. Output is portable — extract over any other Ollama
data dir and the model is immediately visible.
2. init-models.sh gains an s3_pull function that curls a tarball from
$S3_OLLAMA_BASE and extracts into /root/.ollama/models/. Falls back
to ollama pull when S3_OLLAMA_BASE is unset, so s3_pull lines are
safe to commit before the bucket is ready. huihui_ai/qwen3.5-
abliterated:9b promoted to s3_pull as the example.
3. docker-compose.yml model-init service propagates S3_OLLAMA_BASE
from .env. Curl auto-installs at script start because ollama/ollama
doesn't always ship it.
README documents the mirror workflow under "Mirroring models to S3".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Open WebUI was blocking image attachments to the Image Studio model
because mistral-nemo:12b isn't vision-capable. Two changes:
- capabilities.vision flipped to true in the preset JSON. The Tool
only needs the image to make it through __messages__ / __files__
to call edit_image; the actual visual processing happens in
ComfyUI's img2img, not in the LLM. Setting the flag unlocks the
attach-image UI without lying about what mistral-nemo can do.
- System prompt now tells the LLM explicitly: "you may not be able
to visually inspect the attached image — that is fine. Trust the
user's description and call edit_image." Prevents the LLM from
refusing or hedging when it gets an image it can't see.
Documented the upgrade path in image_studio.md for users who want
real vision (qwen2.5vl:7b, llama3.2-vision:11b, minicpm-v:8b — pick
one, add to init-models.sh, swap base_model_id in the preset). The
vision LLM can then write smarter edit_image calls from the image
content rather than the user's description alone.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The data-URI message-event approach didn't render — Open WebUI's chat
frontend ignores data URIs from tool-emitted message events because
the markdown-base64 rewriter (utils/files.py convert_markdown_base64
_images) only runs on assistant streaming content, not on tool emits.
Switched to the path Open WebUI's own image-generation flow uses
(backend/open_webui/utils/middleware.py ~1325):
1. Upload image bytes via open_webui.routers.files.upload_file_handler
(gets back a file_item with id)
2. Resolve the served URL via request.app.url_path_for(
"get_file_content_by_id", id=file_item.id) → /api/v1/files/{id}/content
3. Emit a `files` event:
{"type": "files", "data": {"files": [{"type": "image", "url": ...}]}}
Tools now take __request__, __user__, __metadata__ params for the
upload (Open WebUI auto-injects these). Falls back to data-URI
message event if the runtime imports aren't available (e.g. running
the file standalone for tests). The internal upload bypasses
get_verified_user via the user= kwarg, so no token plumbing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The data URI returned from the tool was being given to the LLM as the
tool result — the LLM then either echoed the base64 to the user as
plain text (screenshot 1) or hallucinated a description of what it
thought the image looked like (screenshot 2 — "an image of a cat
sitting on a windowsill" for a fox-warrior prompt).
Fix: push the markdown image into the chat directly via
__event_emitter__ as a "message" event, and return a short text
confirmation as the function value. The confirmation is worded to
prevent the LLM from describing the image or repeating the markdown
(both common failure modes for tool-using LLMs).
Both generate_image and edit_image fixed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Open WebUI accepts a JSON file at Workspace → Models → Import that
seeds a new model preset in one click instead of the manual table-
driven setup. The new image_studio.json mirrors the Open WebUI bulk-
export schema (array wrapper around the model object with id, name,
base_model_id, params, meta) and pre-fills system prompt, native
function calling, temperature 0.5, top_p 0.9, smart_image_gen tool
attachment, suggestion prompts.
The markdown walkthrough stays as the source of truth for the system
prompt content and as the fallback when import fails (e.g. tool ID
mismatch, unfamiliar field, schema drift across Open WebUI versions).
README points at both paths.
Caveat doc'd in the markdown: if the imported preset doesn't actually
have smart_image_gen attached, the tool ID in the JSON didn't match
what Open WebUI assigned — re-attach manually in the model edit
screen.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Tool now exposes two methods the LLM picks between based on whether
the user attached an image:
generate_image — txt2img (existing, unchanged behavior)
edit_image — img2img on the most recently attached image
edit_image extracts the source image from __messages__ (base64 data
URIs in image_url content blocks) or __files__ (local path or URL),
uploads to ComfyUI's /upload/image, runs an img2img workflow at the
caller-specified denoise (default 0.7), and returns the edited result.
Same per-style routing / sampler / CFG / prefix logic as generation.
Refactored the submit-and-poll loop into _submit_and_fetch shared by
both methods. Image extraction is defensive — tries messages first,
then files (path then URL), returns a clear "no image attached"
message rather than silently generating from scratch.
Image Studio system prompt rewritten to teach the LLM when to call
edit_image vs generate_image and how to pick denoise.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A documented Open WebUI custom-model preset wrapping mistral-nemo:12b
with: aggressive system prompt that mandates calling generate_image,
only the smart_image_gen tool attached, native function calling,
lower temperature for tool-call reliability. Users pick "Image Studio"
from the chat-model dropdown when they want images.
Solves the common case where general-purpose chat models describe an
image in text instead of firing the tool — usually on conversational
phrasings like "can you draw me…". The preset removes the ambiguity
by giving the LLM exactly one job and one tool.
Setup walkthrough in openwebui-models/image_studio.md; deployment
README §9 points users at it as the recommended path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes to make the LLM more likely to call the tool:
1. Lead the docstring with an unambiguous directive — "Create an image
and show it to the user. Use this whenever the user asks you to
draw, generate, ..." plus a hard "do not say you cannot generate
images" line. Open WebUI feeds the docstring straight to the LLM as
the tool description; first line carries the most weight.
2. `style: Optional[StyleName]` where StyleName is a Literal enum of
the seven values. Native function-calling models read the type
annotation and present the seven valid values to the LLM as a
strict choice instead of a free-text param.
If the LLM still doesn't fire the tool, the install is probably wrong:
Workspace → Models → the model → Advanced Params → Function Calling
must be set to Native (not Default).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The static workflow JSONs default to CyberRealisticXLPlay (set in an
earlier commit), but the KSampler still had euler/normal/CFG7/20 — the
generic settings I scaffolded with. Updated to the creator-published
defaults: dpmpp_2m_sde / karras / CFG 4 / 28 steps. CLIP skip 1
already correct (no node needed; default behavior).
Added a section to the deployment README spelling out the trade-off:
static workflows are locked to one checkpoint family at a time because
Open WebUI's nodes mapping doesn't expose sampler/CFG/scheduler/CLIP
skip/prefix. For multi-checkpoint use, the smart_image_gen Tool path is
the only one that gets these right per-prompt.
Re-paste workflows into Open WebUI Settings → Images to pick up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each style now gets a proper baseline covering quality, anatomy, and
watermark/signature suppression — plus the appropriate style-leak guards
(no-cartoon for photo, no-human for furry, score_4–6 suppression for
pony). Quality terms only; no NSFW filtering by default since several
checkpoints in this set are commonly used for adult work and would
fight a baked-in content filter. If SFW-by-default is wanted, add an
explicit safe-mode flag rather than expanding NEGATIVES.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Open WebUI overwrites node 7's text when the request supplies a
negative_prompt, so the default only takes effect when one isn't
provided — which is the common case for the image-button path since the
chat UI doesn't expose the field. Generic quality terms only (no style
or content restrictions) so the default is safe across SD/SDXL/Flux
swaps and doesn't fight whichever checkpoint is loaded.
The smart_image_gen Tool already had per-style defaults; this only
affects the non-Tool image-gen path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Open WebUI Tool the LLM invokes instead of the built-in image action.
Auto-routes among the seven SDXL checkpoints (photo / juggernaut /
pony / general / furry-{nai,noob,il}) based on either an explicit
`style` arg or first-match-wins regex over the prompt. Constructs the
ComfyUI workflow inline, submits via /prompt, polls /history, returns
the result as a base64 data-URI markdown image so no extra hosting is
needed. Per-style default negatives. ComfyUI URL / steps / CFG /
timeout are admin-tunable Valves.
Filters can't see image-gen requests in Open WebUI (the routers skip
the filter chain), so the LLM-driven Tool is the only path that
gives intent-aware routing without changing the chat UX.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the SD 1.5 placeholder. The shipped txt2img/img2img workflows now
reference CyberRealisticXLPlay_V8.0_FP16.safetensors (the checkpoint
figment used in production), and comfyui-init-models.sh ships with no
active fetches — operators uncomment examples or add their own URLs.
The script + workflow filenames have to line up; README explains.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the Ollama model-init pattern: a one-shot Alpine container that
mounts the comfyui-models volume and runs comfyui-init-models.sh, which
curls direct download URLs (HuggingFace by default) into the right
subdirectories. Idempotent — already-present files are skipped.
HF_TOKEN is plumbed through for gated repos (Flux-dev, SD3, etc.) and is
opt-in via .env. The default list ships SD 1.5 only, matching the
placeholder filename in workflows/*.json. Examples for SDXL, Flux, and
upscalers are commented in the script.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
So changing the deployment's hostnames is a one-file edit (.env) instead
of touching docker-compose.yml. WEBUI_URL is the full URL with scheme
(Open WebUI uses it for auth redirects); LLM_URL is the bare hostname
(Anubis wants it for COOKIE_DOMAIN).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the duplicate standalone compose / .env.example / SETUP.md at the
repo root. Bring-up content folded into deployments/ai-stack/README.md
so there's exactly one set of deployment instructions, sitting next to
the files it describes. Root README is now just the repo overview and a
pointer at the deployment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five models from the production GPU host's current pull set. Picks up
the idempotency-checking loop pattern from the source script so re-runs
print "already present" instead of re-pulling.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sanitized snapshot of the live srvno.de stack: Caddy + Ollama (with
preseed) + ComfyUI + Open WebUI + Anubis stub. Real hostnames,
secrets, and bcrypt hash replaced with placeholders so the dir is safe
to commit.
Caddyfile updated to point at comfyui:8188 (the source file pointed at
the now-removed forge service). Dropped FIGMENT_/FORGE_/SEGMENT_IMAGE_TAG
from the env example. Harmonised the init-models.sh mount path between
ollama and model-init services.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI builds and pushes the comfyui-nvidia image to the Gitea container
registry on every v* tag, mirroring the figment release workflow. Compose
now references the registry image (with build context kept for local
iteration) and the docs reflect the pull-by-default flow.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the figment/segment/Forge stack with a single ComfyUI backend
fronted by Open WebUI's native ComfyUI integration. ComfyUI is built from
the official manual install for NVIDIA. Ships txt2img and img2img workflow
templates plus matching node-mapping JSONs that paste into Open WebUI's
admin panel.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>