14 Commits

Author SHA1 Message Date
e77666ea0f Image Studio docs: require setting a separate Task Model after install
tool_choice: required (the thing that makes Image Studio reliably fire
its tools) also blocks Open WebUI's background text-only calls — title
generation, tag suggestions, autocomplete — because the model is
forced to produce a tool call instead of text. Result: chats stay
named 'New Chat' and tag suggestions go silent.

Documented the fix in two places:
  - image_studio.md: dedicated 'Set a separate Task Model (required
    after install)' section explaining the cause and the fix path.
  - deployment README §9: short follow-up note pointing at it so
    operators don't miss it during initial setup.

The fix is purely Open WebUI configuration — no code change. Pick any
non-Image-Studio model already pulled (mistral-nemo:12b is the
obvious default) for the Task Model slot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:37:21 -05:00
d935e24624 Add text-targeted inpainting via GroundingDINO+SAM (mask_text param)
All checks were successful
release / Build & Push Docker Image (push) Successful in 44s
Five pieces:

1. Dockerfile installs storyicon/comfyui_segment_anything (GroundingDINO
   + SAM-HQ in one bundle) into custom_nodes and pip-installs its
   requirements at build time. Model weights auto-download to the
   comfyui-models volume on first inpaint (~3 GB one-time cost).

2. install-custom-node-deps.sh — entrypoint wrapper that pip-installs
   requirements.txt for any custom_node present at startup. Lets users
   add custom nodes via ComfyUI-Manager (or by git-cloning into the
   volume) and have the deps picked up on the next restart, without
   editing the Dockerfile.

3. smart_image_gen v0.6: edit_image gains a `mask_text` param. When
   set, builds an inpainting workflow (LoadImage → GroundingDinoSAM
   Segment → SetLatentNoiseMask → KSampler) so only the named region
   is repainted. When unset, falls through to the existing img2img
   path. Denoise default switches: 1.0 with mask_text (full repaint
   within mask), 0.7 without.

4. Image Studio system prompt teaches the LLM the LOCAL vs GLOBAL
   distinction — set mask_text whenever the user names a specific
   object/region ('the ball', 'the dog', 'the sky'); leave it unset
   only for whole-image style/lighting transformations.

5. Deployment README documents the new mode + the first-inpaint
   weight-download caveat.

Image rebuild required — bump tag to pick up the Dockerfile change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:43:52 -05:00
5a34ced8f1 Add S3 mirror path for Ollama models + mirror-ollama-model.sh helper
Three pieces:

1. mirror-ollama-model.sh — run on any machine that has the model
   pulled. Parses the manifest at
   ~/.ollama/models/manifests/registry.ollama.ai/<ns>/<name>/<tag>,
   greps every sha256:* digest, tars manifest + referenced blobs into
   one .tgz. Output is portable — extract over any other Ollama
   data dir and the model is immediately visible.

2. init-models.sh gains an s3_pull function that curls a tarball from
   $S3_OLLAMA_BASE and extracts into /root/.ollama/models/. Falls back
   to ollama pull when S3_OLLAMA_BASE is unset, so s3_pull lines are
   safe to commit before the bucket is ready. huihui_ai/qwen3.5-
   abliterated:9b promoted to s3_pull as the example.

3. docker-compose.yml model-init service propagates S3_OLLAMA_BASE
   from .env. Curl auto-installs at script start because ollama/ollama
   doesn't always ship it.

README documents the mirror workflow under "Mirroring models to S3".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 13:43:26 -05:00
f77f5993fb Image Studio: enable vision capability + document upgrade path
Open WebUI was blocking image attachments to the Image Studio model
because mistral-nemo:12b isn't vision-capable. Two changes:

  - capabilities.vision flipped to true in the preset JSON. The Tool
    only needs the image to make it through __messages__ / __files__
    to call edit_image; the actual visual processing happens in
    ComfyUI's img2img, not in the LLM. Setting the flag unlocks the
    attach-image UI without lying about what mistral-nemo can do.

  - System prompt now tells the LLM explicitly: "you may not be able
    to visually inspect the attached image — that is fine. Trust the
    user's description and call edit_image." Prevents the LLM from
    refusing or hedging when it gets an image it can't see.

Documented the upgrade path in image_studio.md for users who want
real vision (qwen2.5vl:7b, llama3.2-vision:11b, minicpm-v:8b — pick
one, add to init-models.sh, swap base_model_id in the preset). The
vision LLM can then write smarter edit_image calls from the image
content rather than the user's description alone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 13:31:17 -05:00
6adf133558 Ship Image Studio as importable JSON in addition to markdown walkthrough
Open WebUI accepts a JSON file at Workspace → Models → Import that
seeds a new model preset in one click instead of the manual table-
driven setup. The new image_studio.json mirrors the Open WebUI bulk-
export schema (array wrapper around the model object with id, name,
base_model_id, params, meta) and pre-fills system prompt, native
function calling, temperature 0.5, top_p 0.9, smart_image_gen tool
attachment, suggestion prompts.

The markdown walkthrough stays as the source of truth for the system
prompt content and as the fallback when import fails (e.g. tool ID
mismatch, unfamiliar field, schema drift across Open WebUI versions).
README points at both paths.

Caveat doc'd in the markdown: if the imported preset doesn't actually
have smart_image_gen attached, the tool ID in the JSON didn't match
what Open WebUI assigned — re-attach manually in the model edit
screen.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 13:04:49 -05:00
d4e2058859 smart_image_gen v0.3: add edit_image (img2img) method
The Tool now exposes two methods the LLM picks between based on whether
the user attached an image:

  generate_image — txt2img (existing, unchanged behavior)
  edit_image     — img2img on the most recently attached image

edit_image extracts the source image from __messages__ (base64 data
URIs in image_url content blocks) or __files__ (local path or URL),
uploads to ComfyUI's /upload/image, runs an img2img workflow at the
caller-specified denoise (default 0.7), and returns the edited result.
Same per-style routing / sampler / CFG / prefix logic as generation.

Refactored the submit-and-poll loop into _submit_and_fetch shared by
both methods. Image extraction is defensive — tries messages first,
then files (path then URL), returns a clear "no image attached"
message rather than silently generating from scratch.

Image Studio system prompt rewritten to teach the LLM when to call
edit_image vs generate_image and how to pick denoise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:59:13 -05:00
41d571d8d1 Add Image Studio model preset — forces smart_image_gen tool use
A documented Open WebUI custom-model preset wrapping mistral-nemo:12b
with: aggressive system prompt that mandates calling generate_image,
only the smart_image_gen tool attached, native function calling,
lower temperature for tool-call reliability. Users pick "Image Studio"
from the chat-model dropdown when they want images.

Solves the common case where general-purpose chat models describe an
image in text instead of firing the tool — usually on conversational
phrasings like "can you draw me…". The preset removes the ambiguity
by giving the LLM exactly one job and one tool.

Setup walkthrough in openwebui-models/image_studio.md; deployment
README §9 points users at it as the recommended path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:54:13 -05:00
b815cd6a5f Tune static workflows to CyberRealisticXL recommended settings
The static workflow JSONs default to CyberRealisticXLPlay (set in an
earlier commit), but the KSampler still had euler/normal/CFG7/20 — the
generic settings I scaffolded with. Updated to the creator-published
defaults: dpmpp_2m_sde / karras / CFG 4 / 28 steps. CLIP skip 1
already correct (no node needed; default behavior).

Added a section to the deployment README spelling out the trade-off:
static workflows are locked to one checkpoint family at a time because
Open WebUI's nodes mapping doesn't expose sampler/CFG/scheduler/CLIP
skip/prefix. For multi-checkpoint use, the smart_image_gen Tool path is
the only one that gets these right per-prompt.

Re-paste workflows into Open WebUI Settings → Images to pick up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:47:46 -05:00
392b26167f Add smart_image_gen Tool for per-prompt checkpoint routing
Open WebUI Tool the LLM invokes instead of the built-in image action.
Auto-routes among the seven SDXL checkpoints (photo / juggernaut /
pony / general / furry-{nai,noob,il}) based on either an explicit
`style` arg or first-match-wins regex over the prompt. Constructs the
ComfyUI workflow inline, submits via /prompt, polls /history, returns
the result as a base64 data-URI markdown image so no extra hosting is
needed. Per-style default negatives. ComfyUI URL / steps / CFG /
timeout are admin-tunable Valves.

Filters can't see image-gen requests in Open WebUI (the routers skip
the filter chain), so the LLM-driven Tool is the only path that
gives intent-aware routing without changing the chat UX.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:17:02 -05:00
704bcfdf13 Default workflows to SDXL CyberRealistic; ship empty model preseed
Drops the SD 1.5 placeholder. The shipped txt2img/img2img workflows now
reference CyberRealisticXLPlay_V8.0_FP16.safetensors (the checkpoint
figment used in production), and comfyui-init-models.sh ships with no
active fetches — operators uncomment examples or add their own URLs.

The script + workflow filenames have to line up; README explains.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:02:26 -05:00
0ad99b6199 Add comfyui-model-init sidecar for ComfyUI model preseeding
Mirrors the Ollama model-init pattern: a one-shot Alpine container that
mounts the comfyui-models volume and runs comfyui-init-models.sh, which
curls direct download URLs (HuggingFace by default) into the right
subdirectories. Idempotent — already-present files are skipped.

HF_TOKEN is plumbed through for gated repos (Flux-dev, SD3, etc.) and is
opt-in via .env. The default list ships SD 1.5 only, matching the
placeholder filename in workflows/*.json. Examples for SDXL, Flux, and
upscalers are commented in the script.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 11:57:24 -05:00
21e976e275 Externalise WEBUI_URL / LLM_URL to .env
All checks were successful
release / Build & Push Docker Image (push) Successful in 31m43s
So changing the deployment's hostnames is a one-file edit (.env) instead
of touching docker-compose.yml. WEBUI_URL is the full URL with scheme
(Open WebUI uses it for auth redirects); LLM_URL is the bare hostname
(Anubis wants it for COOKIE_DOMAIN).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 10:49:29 -05:00
97547c783c Make ai-stack the only deployment shape
Drops the duplicate standalone compose / .env.example / SETUP.md at the
repo root. Bring-up content folded into deployments/ai-stack/README.md
so there's exactly one set of deployment instructions, sitting next to
the files it describes. Root README is now just the repo overview and a
pointer at the deployment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 10:45:23 -05:00
5b61caa197 Add deployments/ai-stack — combined production-shape example
Sanitized snapshot of the live srvno.de stack: Caddy + Ollama (with
preseed) + ComfyUI + Open WebUI + Anubis stub. Real hostnames,
secrets, and bcrypt hash replaced with placeholders so the dir is safe
to commit.

Caddyfile updated to point at comfyui:8188 (the source file pointed at
the now-removed forge service). Dropped FIGMENT_/FORGE_/SEGMENT_IMAGE_TAG
from the env example. Harmonised the init-models.sh mount path between
ollama and model-init services.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 10:40:41 -05:00