tool_choice: required (the thing that makes Image Studio reliably fire
its tools) also blocks Open WebUI's background text-only calls — title
generation, tag suggestions, autocomplete — because the model is
forced to produce a tool call instead of text. Result: chats stay
named 'New Chat' and tag suggestions go silent.
Documented the fix in two places:
- image_studio.md: dedicated 'Set a separate Task Model (required
after install)' section explaining the cause and the fix path.
- deployment README §9: short follow-up note pointing at it so
operators don't miss it during initial setup.
The fix is purely Open WebUI configuration — no code change. Pick any
non-Image-Studio model already pulled (mistral-nemo:12b is the
obvious default) for the Task Model slot.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five pieces:
1. Dockerfile installs storyicon/comfyui_segment_anything (GroundingDINO
+ SAM-HQ in one bundle) into custom_nodes and pip-installs its
requirements at build time. Model weights auto-download to the
comfyui-models volume on first inpaint (~3 GB one-time cost).
2. install-custom-node-deps.sh — entrypoint wrapper that pip-installs
requirements.txt for any custom_node present at startup. Lets users
add custom nodes via ComfyUI-Manager (or by git-cloning into the
volume) and have the deps picked up on the next restart, without
editing the Dockerfile.
3. smart_image_gen v0.6: edit_image gains a `mask_text` param. When
set, builds an inpainting workflow (LoadImage → GroundingDinoSAM
Segment → SetLatentNoiseMask → KSampler) so only the named region
is repainted. When unset, falls through to the existing img2img
path. Denoise default switches: 1.0 with mask_text (full repaint
within mask), 0.7 without.
4. Image Studio system prompt teaches the LLM the LOCAL vs GLOBAL
distinction — set mask_text whenever the user names a specific
object/region ('the ball', 'the dog', 'the sky'); leave it unset
only for whole-image style/lighting transformations.
5. Deployment README documents the new mode + the first-inpaint
weight-download caveat.
Image rebuild required — bump tag to pick up the Dockerfile change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three pieces:
1. mirror-ollama-model.sh — run on any machine that has the model
pulled. Parses the manifest at
~/.ollama/models/manifests/registry.ollama.ai/<ns>/<name>/<tag>,
greps every sha256:* digest, tars manifest + referenced blobs into
one .tgz. Output is portable — extract over any other Ollama
data dir and the model is immediately visible.
2. init-models.sh gains an s3_pull function that curls a tarball from
$S3_OLLAMA_BASE and extracts into /root/.ollama/models/. Falls back
to ollama pull when S3_OLLAMA_BASE is unset, so s3_pull lines are
safe to commit before the bucket is ready. huihui_ai/qwen3.5-
abliterated:9b promoted to s3_pull as the example.
3. docker-compose.yml model-init service propagates S3_OLLAMA_BASE
from .env. Curl auto-installs at script start because ollama/ollama
doesn't always ship it.
README documents the mirror workflow under "Mirroring models to S3".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Open WebUI was blocking image attachments to the Image Studio model
because mistral-nemo:12b isn't vision-capable. Two changes:
- capabilities.vision flipped to true in the preset JSON. The Tool
only needs the image to make it through __messages__ / __files__
to call edit_image; the actual visual processing happens in
ComfyUI's img2img, not in the LLM. Setting the flag unlocks the
attach-image UI without lying about what mistral-nemo can do.
- System prompt now tells the LLM explicitly: "you may not be able
to visually inspect the attached image — that is fine. Trust the
user's description and call edit_image." Prevents the LLM from
refusing or hedging when it gets an image it can't see.
Documented the upgrade path in image_studio.md for users who want
real vision (qwen2.5vl:7b, llama3.2-vision:11b, minicpm-v:8b — pick
one, add to init-models.sh, swap base_model_id in the preset). The
vision LLM can then write smarter edit_image calls from the image
content rather than the user's description alone.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Open WebUI accepts a JSON file at Workspace → Models → Import that
seeds a new model preset in one click instead of the manual table-
driven setup. The new image_studio.json mirrors the Open WebUI bulk-
export schema (array wrapper around the model object with id, name,
base_model_id, params, meta) and pre-fills system prompt, native
function calling, temperature 0.5, top_p 0.9, smart_image_gen tool
attachment, suggestion prompts.
The markdown walkthrough stays as the source of truth for the system
prompt content and as the fallback when import fails (e.g. tool ID
mismatch, unfamiliar field, schema drift across Open WebUI versions).
README points at both paths.
Caveat doc'd in the markdown: if the imported preset doesn't actually
have smart_image_gen attached, the tool ID in the JSON didn't match
what Open WebUI assigned — re-attach manually in the model edit
screen.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Tool now exposes two methods the LLM picks between based on whether
the user attached an image:
generate_image — txt2img (existing, unchanged behavior)
edit_image — img2img on the most recently attached image
edit_image extracts the source image from __messages__ (base64 data
URIs in image_url content blocks) or __files__ (local path or URL),
uploads to ComfyUI's /upload/image, runs an img2img workflow at the
caller-specified denoise (default 0.7), and returns the edited result.
Same per-style routing / sampler / CFG / prefix logic as generation.
Refactored the submit-and-poll loop into _submit_and_fetch shared by
both methods. Image extraction is defensive — tries messages first,
then files (path then URL), returns a clear "no image attached"
message rather than silently generating from scratch.
Image Studio system prompt rewritten to teach the LLM when to call
edit_image vs generate_image and how to pick denoise.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A documented Open WebUI custom-model preset wrapping mistral-nemo:12b
with: aggressive system prompt that mandates calling generate_image,
only the smart_image_gen tool attached, native function calling,
lower temperature for tool-call reliability. Users pick "Image Studio"
from the chat-model dropdown when they want images.
Solves the common case where general-purpose chat models describe an
image in text instead of firing the tool — usually on conversational
phrasings like "can you draw me…". The preset removes the ambiguity
by giving the LLM exactly one job and one tool.
Setup walkthrough in openwebui-models/image_studio.md; deployment
README §9 points users at it as the recommended path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The static workflow JSONs default to CyberRealisticXLPlay (set in an
earlier commit), but the KSampler still had euler/normal/CFG7/20 — the
generic settings I scaffolded with. Updated to the creator-published
defaults: dpmpp_2m_sde / karras / CFG 4 / 28 steps. CLIP skip 1
already correct (no node needed; default behavior).
Added a section to the deployment README spelling out the trade-off:
static workflows are locked to one checkpoint family at a time because
Open WebUI's nodes mapping doesn't expose sampler/CFG/scheduler/CLIP
skip/prefix. For multi-checkpoint use, the smart_image_gen Tool path is
the only one that gets these right per-prompt.
Re-paste workflows into Open WebUI Settings → Images to pick up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Open WebUI Tool the LLM invokes instead of the built-in image action.
Auto-routes among the seven SDXL checkpoints (photo / juggernaut /
pony / general / furry-{nai,noob,il}) based on either an explicit
`style` arg or first-match-wins regex over the prompt. Constructs the
ComfyUI workflow inline, submits via /prompt, polls /history, returns
the result as a base64 data-URI markdown image so no extra hosting is
needed. Per-style default negatives. ComfyUI URL / steps / CFG /
timeout are admin-tunable Valves.
Filters can't see image-gen requests in Open WebUI (the routers skip
the filter chain), so the LLM-driven Tool is the only path that
gives intent-aware routing without changing the chat UX.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the SD 1.5 placeholder. The shipped txt2img/img2img workflows now
reference CyberRealisticXLPlay_V8.0_FP16.safetensors (the checkpoint
figment used in production), and comfyui-init-models.sh ships with no
active fetches — operators uncomment examples or add their own URLs.
The script + workflow filenames have to line up; README explains.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the Ollama model-init pattern: a one-shot Alpine container that
mounts the comfyui-models volume and runs comfyui-init-models.sh, which
curls direct download URLs (HuggingFace by default) into the right
subdirectories. Idempotent — already-present files are skipped.
HF_TOKEN is plumbed through for gated repos (Flux-dev, SD3, etc.) and is
opt-in via .env. The default list ships SD 1.5 only, matching the
placeholder filename in workflows/*.json. Examples for SDXL, Flux, and
upscalers are commented in the script.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
So changing the deployment's hostnames is a one-file edit (.env) instead
of touching docker-compose.yml. WEBUI_URL is the full URL with scheme
(Open WebUI uses it for auth redirects); LLM_URL is the bare hostname
(Anubis wants it for COOKIE_DOMAIN).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the duplicate standalone compose / .env.example / SETUP.md at the
repo root. Bring-up content folded into deployments/ai-stack/README.md
so there's exactly one set of deployment instructions, sitting next to
the files it describes. Root README is now just the repo overview and a
pointer at the deployment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sanitized snapshot of the live srvno.de stack: Caddy + Ollama (with
preseed) + ComfyUI + Open WebUI + Anubis stub. Real hostnames,
secrets, and bcrypt hash replaced with placeholders so the dir is safe
to commit.
Caddyfile updated to point at comfyui:8188 (the source file pointed at
the now-removed forge service). Dropped FIGMENT_/FORGE_/SEGMENT_IMAGE_TAG
from the env example. Harmonised the init-models.sh mount path between
ollama and model-init services.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>