42 Commits
v0.1.0 ... main

Author SHA1 Message Date
c07e962cae Image tools: migrate to OWUI 0.9.0 async model accessors
Open WebUI 0.9.0 made every model-class accessor (Users.get_user_by_id,
Chats.get_chat_by_id, Files.get_file_by_id, …) a coroutine. Both tools
were still calling them synchronously, so the calls returned coroutines
instead of model objects; the first downstream attribute access threw,
the bare `except Exception: return False` swallowed it, and uploads
silently fell through to the data-URI fallback. The data-URI markdown
rendered during streaming but didn't survive post-stream commit, which
looked like "image flashes in, then disappears."

Add await to the six call sites; promote `_read_file_dict` to async
since it now contains an await; restore `_push_image_to_chat` to the
canonical `files` event so the file-attachment chrome (thumbnail +
download) comes back.

This supersedes commit d034700, which mis-diagnosed the symptom as a
virtualization regression and switched to a `message`-event markdown
workaround. The workaround didn't help (same flash-and-vanish) because
the upload pre-check still failed for the same async-migration reason
and the data-URI fallback path still ran.

smart_image_gen.py 0.7.9 -> 0.7.10
smart_image_pipe.py 0.1.1 -> 0.1.2

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 06:16:02 -05:00
d034700af9 Image tools: work around OWUI 0.9.x files-event regression
Open WebUI 0.9.0 introduced chat-history virtualization that
unmounts off-screen assistant messages and reconstructs them from
persisted shape; `files` attached mid-stream by a tool don't
survive the round-trip — the image flashes in during streaming
and disappears the moment the message commits.

Both image tools now upload via Open WebUI's file store as before
but surface the result as a markdown image injected into the
assistant message via a `message` event, which is part of the
persisted shape and renders reliably across remounts. Trade-off:
loses the file-attachment chrome (thumbnail + download button).
Each tool has a TODO marking the swap site with the original
`files` payload inlined for one-line revert once upstream fixes
the regression.

smart_image_gen.py 0.7.8 -> 0.7.9
smart_image_pipe.py 0.1.0 -> 0.1.1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 06:05:12 -05:00
28f370a80b Update deployments/ai-stack/openwebui-tools/smart_image_gen.py 2026-04-20 17:08:32 +00:00
02a4bece5d Ollama: keep loaded models resident until evicted (KEEP_ALIVE=-1)
Was 30m, which evicts after 30 minutes of inactivity and forces a
reload penalty on the next request. Setting -1 holds models in VRAM
indefinitely; MAX_LOADED_MODELS=3 caps how many can stay resident
simultaneously (vs the previous 2). Tune MAX higher if you're
rotating between more than three models AND your GPU has the VRAM
for it — comment in the compose explains the trade-off.

For the live srvno.de stack: OLLAMA_KEEP_ALIVE=-1 takes effect on
the next `docker compose up -d ollama`. Loaded models survive the
restart only if they're re-requested before swap-out anyway.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 19:25:42 -05:00
a1af88a632 smart_image_gen v0.7.8: per-job filename_prefix + source-image diagnostic
User reported observing a wrong image returned. Two hardenings:

1. _job_prefix() generates a per-submission filename_prefix
   ('smartedit_<10hex>', 'smartinpaint_<10hex>', 'smartgen_<10hex>')
   so SaveImage outputs from concurrent jobs sit in their own
   namespace and ComfyUI's auto-incrementing counter can never
   produce filenames that overlap across jobs. With a shared prefix,
   if a queued job's history-fetch ever raced past its own SaveImage
   record there was a theoretical (if unlikely) path to picking up
   another job's _00001_.png. Per-job prefix kills that vector.

2. edit_image now emits the source image's SHA-1 and byte count in a
   status event before uploading to ComfyUI. If a future 'wrong
   image' report comes in, that hash should match the prior
   generation's output — if it doesn't, we know
   _extract_attached_image picked up the wrong source rather than
   ComfyUI returning the wrong file. Hashlib import is local so the
   module's import surface stays clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 18:22:45 -05:00
d8c8421361 Image Studio: lock in working config — Native + enable_thinking=false
User confirmed end-to-end working stack:
  - base_model_id: huihui_ai/qwen3-vl-abliterated:8b
  - function_calling: native (gives 'View Result from edit_image'
    blocks, structured tool call traces)
  - custom_params:
      tool_choice: required        (forces tool call every turn)
      enable_thinking: false       (server-side disable; abliterated
                                    Qwen ignores the /no_think system
                                    prompt directive — when thinking
                                    is on the tool call leaks inside
                                    a thinking block as text)

Updated image_studio.json + the markdown setup table + the
'Qwen 3.x quirk' explainer to match. The /no_think line in the
system prompt stays in for non-abliterated Qwen variants but is now
documented as best-effort backup; enable_thinking=false is the
authoritative kill-switch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 18:12:24 -05:00
f5a222fe6f Image Studio: default base model → huihui_ai/qwen3-vl-abliterated:8b
User confirmed this model works end-to-end after the multi-base-model
search. Settled on it because Qwen 3 VL's fine-tune lineage isn't
damaged by abliteration the way Qwen 3.5's is, so it both calls tools
reliably AND won't refuse to dispatch on NSFW edit prompts.

Updated:
  - image_studio.json base_model_id → huihui_ai/qwen3-vl-abliterated:8b
  - init-models.sh: pulls the abliterated VL model in place of the
    non-working standard qwen3.5:9b
  - image_studio.md: setup table base-model row + vision-section
    'why this and not the alternatives' explanation

function_calling stays default and tool_choice required. Operator
can flip to native + drop tool_choice once they've verified the new
base behaves with structured tool calls (which would also remove the
need for a separate Task Model for title generation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:56:16 -05:00
ec6108888a smart_image_gen v0.7.7: enforce style inheritance for edit_image
Vision-capable LLMs misclassify rendered subjects when picking a
style — observed: model called juggernaut for an edit on a furry-il
generation because the rendered character looked 'photoreal-ish' to
its vision encoder. Each visual judgment is independent so styles
flip mid-chat.

Flipped resolution order in edit_image so inheritance from the prior
generate_image / edit_image call DOMINATES the LLM's explicit style
arg. The LLM's choice only wins when there's nothing to inherit
(first edit in a chat, fresh user upload). Workaround for legitimate
style changes is starting a new chat.

System prompt updated to match: tells the LLM that style inheritance
is enforced, that passing style on follow-up calls is ignored, and
that user requests for style change require a new chat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:53:32 -05:00
20d4bd5b72 Image Studio: switch base model to qwen3.5:9b (non-abliterated)
The abliterated 9B was the source of the tool-call format mangling
(both Native XML leaks and Default Python-syntax leaks). Standard
qwen3.5:9b is the same family, same 9B size (6.6 GB), vision-capable
and native tool calling actually works.

The image content uncensored-ness was always going to come from the
SDXL checkpoints in ComfyUI — the LLM is just a dispatcher. Picking
a well-behaved tool-caller for that role doesn't compromise output
content.

Updated:
  - image_studio.json base_model_id → qwen3.5:9b
  - init-models.sh: pulls qwen3.5:9b as a standard registry pull,
    in addition to the existing abliterated 9B (which stays for
    other chat models)
  - image_studio.md setup table + vision section explaining why
    we chose standard over abliterated for the dispatcher role

function_calling stays as 'default' and tool_choice as 'required'
for now — they don't hurt with a reliable tool-caller and operators
can flip back to native + drop tool_choice once they verify it
works for them (which also removes the need for a separate Task
Model for title generation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:09:42 -05:00
1ae451ad5f Add smart_image_pipe.py — deterministic Pipe for image gen / edit / inpaint
Registers as 'Image Studio (Pipe)' in Open WebUI's chat-model dropdown.
No LLM in the loop — the Pipe parses the user's message with regex,
finds attached images via the same multi-source extractor as the
Tool, routes deterministically to txt2img / img2img / inpaint, calls
ComfyUI, returns the image. Bypasses the abliterated Qwen 3.5 ×
Open WebUI tool-call format interop bug entirely.

Style resolution: explicit FORCE_STYLE valve > inherited from prior
assistant message ('style: X' marker) > keyword regex on user text >
default (furry-il).

Inpaint trigger: regex picks up phrasings like 'change the X',
'remove the Y', 'make the Z bigger/red/etc' and pulls the noun
phrase as mask_text. No match → full img2img.

Reuses the same per-style settings, prefix dialects, negatives,
GrowMask feathering, file extraction (with chat-DB fallback) and
files-event push as smart_image_gen.py — code is duplicated rather
than shared because Open WebUI loads each plugin file standalone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:58:01 -05:00
011fade024 Image Studio prompt: forbid post-tool echo of the function call
User saw the LLM's chat response include a literal
'edit_image(prompt="...", mask_text="...", style="furry-il",
denoise=0.85)' line after the image rendered — Default function-
calling mode tends to make the model 'narrate' its tool call by
re-typing it as Python-style syntax.

Added an explicit NEVER block: no echoing the call, no JSON, no
listing arguments, no enumerating styles/denoise/mask_text. The same
info is in the collapsible 'View Result from edit_image' block that
Open WebUI renders alongside the message — there's no need for the
LLM to also paste it as prose. Follow-up text is for human
conversation, not bookkeeping.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:41:51 -05:00
63917709c1 smart_image_gen v0.7.6: feather inpaint mask edges via GrowMask
The raw SAM mask is a hard binary edge — KSampler repaints right up
to it, and SDXL has no surrounding-pixel context inside the mask to
blend with. Result: the inpainted region looks pasted-on with visible
seams (the artifact the user reported on the werewolf-groin edit).

Inserted a stock GrowMask node (id 17) between
GroundingDinoSAMSegment and SetLatentNoiseMask:
  - expand=12 grows the mask outward by 12 px so the new content
    overlaps a strip of original pixels for blending
  - tapered_corners=True softens the edge so the noise transition
    isn't a step function

GrowMask is built into stock ComfyUI; no extra custom node install.
KSampler still uses the caller-supplied denoise (default 1.0 in
inpaint mode).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:40:39 -05:00
1ed2e7293e Image Studio: ship function_calling=default — Native leaks Qwen 3.5 XML
Qwen 3.5 abliterated emits its native tool-call format
(<function=...><parameter=...>) wrapped in <tool_call> tags that the
current Open WebUI / Ollama parser does not reliably round-trip — the
XML leaks to chat as plain text instead of executing. Switching the
preset to Function Calling: Default, which uses Open WebUI's own
prompt-injection wrapper, fires the tool reliably.

Native is documented as the right choice only when the operator has
swapped the base model to one with proven OWUI-side parser support
(mistral-nemo:12b, qwen2.5vl:7b). For the shipped Qwen 3.5
abliterated default, Default is the working setting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:36:05 -05:00
18a205d69d smart_image_gen v0.7.5: fix black-image bug — fetch from SaveImage explicitly
_submit_and_fetch was iterating history[prompt_id]['outputs'].values()
and grabbing the first image it saw. With the inpaint workflow that
includes nodes other than SaveImage that emit IMAGE outputs (the
GroundingDinoSAMSegment node returns an overlay/mask-applied image
in addition to the mask), and dict iteration order is undefined —
sometimes we'd return the overlay (which can render mostly black)
instead of the actual SaveImage result.

Fix: prefer outputs from the SaveImage node id ('9' in every workflow
the tool builds) explicitly. Fall back to scanning all outputs only
if SaveImage didn't appear (workflow drift, manual edit, etc).

User reported seeing the correct inpaint in ComfyUI's native UI but
black in chat — this is the gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:30:04 -05:00
0fa8040251 Eliminate first-inpaint timeout: preseed SAM/GroundingDINO + 600s default
Two changes to address the timeout-and-retry loop the user hit on the
first edit_image call:

  1. comfyui-init-models.sh now fetches the three weights inpaint
     needs into /models/sams and /models/grounding-dino:
       - sam_hq_vit_h.pth                (~2.5 GB)
       - groundingdino_swint_ogc.pth     (~700 MB)
       - GroundingDINO_SwinT_OGC.cfg.py  (~1 KB)
     Without preseeding these auto-download on first inpaint, which
     takes minutes and times out the tool call. The mkdir line gets
     the new subdirs added too.

  2. Tool TIMEOUT_SECONDS valve default bumped 240s → 600s as
     defense-in-depth — even with weights preseeded, BERT-base
     auto-downloads via transformers on first GroundingDINO load
     (~30s) and a slow KSampler on a contended GPU can push past
     4 minutes occasionally. Steady-state runs still finish in under
     a minute; the valve only matters for first-call latency.

After comfyui-model-init re-runs (`docker compose up -d
comfyui-model-init`), first inpaint should be near-instant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:39:23 -05:00
e77666ea0f Image Studio docs: require setting a separate Task Model after install
tool_choice: required (the thing that makes Image Studio reliably fire
its tools) also blocks Open WebUI's background text-only calls — title
generation, tag suggestions, autocomplete — because the model is
forced to produce a tool call instead of text. Result: chats stay
named 'New Chat' and tag suggestions go silent.

Documented the fix in two places:
  - image_studio.md: dedicated 'Set a separate Task Model (required
    after install)' section explaining the cause and the fix path.
  - deployment README §9: short follow-up note pointing at it so
    operators don't miss it during initial setup.

The fix is purely Open WebUI configuration — no code change. Pick any
non-Image-Studio model already pulled (mistral-nemo:12b is the
obvious default) for the Task Model slot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:37:21 -05:00
6700f6ce33 smart_image_gen v0.7.3: edit_image inherits style from prior tool call
User reported edit_image picking 'juggernaut' (photoreal) for an edit
on a furry image — the LLM didn't carry context, and the tool's
fallback _route_style only sees the edit instruction text, which for
neutral edits ('bigger', 'glowing eyes') has no furry keywords.

Fix in two places:

  1. Tool: _inherited_style scans __messages__ in reverse for prior
     generate_image / edit_image tool calls and returns the style arg
     they used. edit_image now resolves: explicit style → inherited →
     keyword fallback. Deterministic, no LLM cooperation needed for
     follow-up edits on previously-generated images.

  2. System prompt: explicit three-step style resolution for
     edit_image. Generated by you → omit style and auto-inherit.
     Uploaded by user → INSPECT visually and pick a matching style
     (the LLM is the only thing with vision; the tool can't see
     pixels). Then keep that style for subsequent edits.

Both paths matter — the tool fix handles the common case
deterministically, the prompt fix handles the upload case where
there's nothing to inherit from.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:31:57 -05:00
def27087c1 Make every image tag in compose pinnable via .env
Floating tags (`latest`, `main`) made deploys non-deterministic — a
container recreate could pull a newer Open WebUI, Ollama, or Anubis at
any time. Wrapped every image: src in a ${VAR:-default} substitution
and surfaced the full set in .env.example with a header explaining
where to find current versions and bumped COMFYUI_IMAGE_TAG default
to 0.2.1 (the just-tagged version with the transformers pin).

Vars added: CADDY_TAG, OLLAMA_TAG, OPEN_WEBUI_TAG, ALPINE_TAG,
ANUBIS_TAG (COMFYUI_IMAGE_TAG already existed). Defaults match the
previous floating-tag behaviour for ones I'm not confident which
specific version to pin (Ollama, Open WebUI, Anubis) — operator should
update those to verified versions for production deploys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:27:08 -05:00
2cecf77981 Pin transformers <5 — comfyui_segment_anything's GroundingDINO needs it
All checks were successful
release / Build & Push Docker Image (push) Successful in 1m12s
transformers 5.0 removed BertModel.get_head_mask (it was on the legacy
4.x API). comfyui_segment_anything's GroundingDINO bertwarper.py still
calls bert_model.get_head_mask in __init__, so first inpaint crashes
with AttributeError. Pinned transformers>=4.40,<5 in two places:

  - Dockerfile: applied AFTER the custom node's requirements.txt
    install so it wins on a fresh image build.
  - install-custom-node-deps.sh entrypoint: re-applied at every
    container start so any future custom-node install (via
    ComfyUI-Manager or volume clone) that pulls a newer transformers
    transitively gets pinned back into the working range.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:21:21 -05:00
f26dfbee02 smart_image_gen v0.7.2: chat-DB fallback + diagnostic 'no image' msg
If __messages__ doesn't include the assistant's prior file attachments
(which is what the screenshot is showing), the new fallback queries
the chat by id via Chats.get_chat_by_id and walks every persisted
message for files. Open WebUI's socket handler always upserts files
onto the assistant message via {'files': files} so this path is
authoritative.

The 'No image found' return now includes diagnostic counts —
__files__, __messages__, messages_with_files, chat_id_present,
openwebui_runtime — so subsequent failures actually show what the
tool saw instead of being opaque.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:07:55 -05:00
06433d3815 smart_image_gen v0.7.1: rename edit_image arg + parse file id from URL
Two bugs in one screenshot:

1. LLM called edit_image(prompt=..., ...) but the signature was
   edit_image(edit_instruction=..., ...) — mismatch, missing-arg
   crash. Renamed the first param to `prompt` so both tools have a
   matching, predictable name. System prompt updated with an explicit
   'do not invent edit_instruction' line for stubborn models.

2. After fix #1, edit_image still couldn't find the prior generated
   image because Open WebUI assistant-message file attachments only
   carry {type, url} (no id, no path). _read_file_dict now also
   greps the file id out of /api/v1/files/<uuid>/content URLs and
   feeds it to Files.get_file_by_id. Verified pattern matches
   absolute URLs (https://llm-1.srvno.de/api/v1/files/.../content).

System prompt also now says 'including images you previously
generated in this chat' to nudge the LLM to pick up assistant
outputs as edit candidates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:58:40 -05:00
780ce42711 Image Studio: move tool_choice into params.custom_params (correct field)
Previous commit put tool_choice at the top level of params. Open WebUI
drops that silently — apply_model_params_to_body has a whitelist of
mapped param names (temperature, top_p, etc.) and tool_choice isn't
on it. The Custom Parameters UI section also only iterates
params.custom_params, which is why the value didn't appear there
after importing the preset.

Correct location is the custom_params sub-dict, where values go
through json.loads before being merged into the outgoing chat
completion body. 'required' stays a string after the failed
json.loads and ends up exactly where the OpenAI / Ollama tools spec
expects it.

Source: src/lib/components/chat/Settings/Advanced/AdvancedParams.svelte
(UI binding) and backend/open_webui/utils/payload.py (serialization).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:50:34 -05:00
f6f5690fcd smart_image_gen v0.7: edit_image finds previously-emitted images
Bug: after generate_image surfaced an image via the files event, the
next edit_image call returned 'No image found in the chat'. The image
was attached to the assistant's message, but _extract_attached_image
only scanned the user's __files__ param and image_url content blocks
on user messages — it never looked at messages.files for any role.

Fix: rewrite extraction to scan messages[].files in reverse for ALL
roles, so an assistant-emitted image from a prior tool call is found
the same way as a user-attached upload. Use Open WebUI's internal
Files.get_file_by_id when the file dict has an id, so we get raw
bytes from disk without going through the auth-protected
/api/v1/files/{id}/content endpoint. Old path-key and URL-fetch
paths kept as fallbacks.

Refactored shared helpers _file_dict_is_image and _read_file_dict
out of the loop to keep the search logic readable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:46:10 -05:00
d935e24624 Add text-targeted inpainting via GroundingDINO+SAM (mask_text param)
All checks were successful
release / Build & Push Docker Image (push) Successful in 44s
Five pieces:

1. Dockerfile installs storyicon/comfyui_segment_anything (GroundingDINO
   + SAM-HQ in one bundle) into custom_nodes and pip-installs its
   requirements at build time. Model weights auto-download to the
   comfyui-models volume on first inpaint (~3 GB one-time cost).

2. install-custom-node-deps.sh — entrypoint wrapper that pip-installs
   requirements.txt for any custom_node present at startup. Lets users
   add custom nodes via ComfyUI-Manager (or by git-cloning into the
   volume) and have the deps picked up on the next restart, without
   editing the Dockerfile.

3. smart_image_gen v0.6: edit_image gains a `mask_text` param. When
   set, builds an inpainting workflow (LoadImage → GroundingDinoSAM
   Segment → SetLatentNoiseMask → KSampler) so only the named region
   is repainted. When unset, falls through to the existing img2img
   path. Denoise default switches: 1.0 with mask_text (full repaint
   within mask), 0.7 without.

4. Image Studio system prompt teaches the LLM the LOCAL vs GLOBAL
   distinction — set mask_text whenever the user names a specific
   object/region ('the ball', 'the dog', 'the sky'); leave it unset
   only for whole-image style/lighting transformations.

5. Deployment README documents the new mode + the first-inpaint
   weight-download caveat.

Image rebuild required — bump tag to pick up the Dockerfile change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:43:52 -05:00
7c7897818e Image Studio: bake tool_choice=required into the preset
Without it, abliterated/reasoning models like huihui_ai/qwen3.5-
abliterated:9b reliably choose to write a planning response instead
of calling the tool — even with /no_think and a terse imperative
system prompt. tool_choice=required is passed through to Ollama's
chat API and removes the model's option to respond in text at all,
forcing exactly one tool call per turn.

Confirmed working with the abliterated Qwen 3.5 9B base.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:30:49 -05:00
a1fca4d5d9 Image Studio: default base model → huihui_ai/qwen3.5-abliterated:9b
Re-imports of image_studio.json kept reverting the base model back to
mistral-nemo:12b because that was still hard-coded in the JSON.
Updated the JSON, the markdown setup table, and the vision-capability
section to lead with the Qwen 3.5 abliterated 9B preset.

Re-ordered the markdown's vision section: shipped default first
(Qwen 3.5 abliterated, with the /no_think + enable_thinking caveat
called out explicitly), alternatives (qwen2.5vl:7b, llama3.2-vision,
minicpm-v) second, non-vision fallback third.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:22:13 -05:00
0b1c2ee5b5 Image Studio: tighten system prompt, add /no_think for Qwen 3.x
User reported the model writing a multi-paragraph 'editing plan'
instead of calling edit_image, only firing the tool when explicitly
told to. Two underlying causes:

  1. The previous system prompt was conversational ('ALWAYS / NEVER'
     lists with discussion) — Qwen-style models read that as topics
     to think about rather than rules to obey. Replaced with terse,
     imperative dispatcher framing: 'You do not respond in prose.
     Every user message MUST result in exactly one tool call.'

  2. Qwen 3.x ships with thinking mode on by default. Reasoning
     models almost universally degrade native function calling — they
     plan how to use a tool instead of just calling it. Prepended
     /no_think (Qwen 3.x recognises this token and skips reasoning).
     No-op for non-Qwen-3 base models.

Removed the long after-action paragraph that encouraged elaborate
follow-ups; replaced with 'at most one short sentence'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:18:49 -05:00
5a34ced8f1 Add S3 mirror path for Ollama models + mirror-ollama-model.sh helper
Three pieces:

1. mirror-ollama-model.sh — run on any machine that has the model
   pulled. Parses the manifest at
   ~/.ollama/models/manifests/registry.ollama.ai/<ns>/<name>/<tag>,
   greps every sha256:* digest, tars manifest + referenced blobs into
   one .tgz. Output is portable — extract over any other Ollama
   data dir and the model is immediately visible.

2. init-models.sh gains an s3_pull function that curls a tarball from
   $S3_OLLAMA_BASE and extracts into /root/.ollama/models/. Falls back
   to ollama pull when S3_OLLAMA_BASE is unset, so s3_pull lines are
   safe to commit before the bucket is ready. huihui_ai/qwen3.5-
   abliterated:9b promoted to s3_pull as the example.

3. docker-compose.yml model-init service propagates S3_OLLAMA_BASE
   from .env. Curl auto-installs at script start because ollama/ollama
   doesn't always ship it.

README documents the mirror workflow under "Mirroring models to S3".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 13:43:26 -05:00
f77f5993fb Image Studio: enable vision capability + document upgrade path
Open WebUI was blocking image attachments to the Image Studio model
because mistral-nemo:12b isn't vision-capable. Two changes:

  - capabilities.vision flipped to true in the preset JSON. The Tool
    only needs the image to make it through __messages__ / __files__
    to call edit_image; the actual visual processing happens in
    ComfyUI's img2img, not in the LLM. Setting the flag unlocks the
    attach-image UI without lying about what mistral-nemo can do.

  - System prompt now tells the LLM explicitly: "you may not be able
    to visually inspect the attached image — that is fine. Trust the
    user's description and call edit_image." Prevents the LLM from
    refusing or hedging when it gets an image it can't see.

Documented the upgrade path in image_studio.md for users who want
real vision (qwen2.5vl:7b, llama3.2-vision:11b, minicpm-v:8b — pick
one, add to init-models.sh, swap base_model_id in the preset). The
vision LLM can then write smarter edit_image calls from the image
content rather than the user's description alone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 13:31:17 -05:00
b604e3f509 smart_image_gen v0.5: surface images via files event (canonical path)
The data-URI message-event approach didn't render — Open WebUI's chat
frontend ignores data URIs from tool-emitted message events because
the markdown-base64 rewriter (utils/files.py convert_markdown_base64
_images) only runs on assistant streaming content, not on tool emits.

Switched to the path Open WebUI's own image-generation flow uses
(backend/open_webui/utils/middleware.py ~1325):

  1. Upload image bytes via open_webui.routers.files.upload_file_handler
     (gets back a file_item with id)
  2. Resolve the served URL via request.app.url_path_for(
     "get_file_content_by_id", id=file_item.id) → /api/v1/files/{id}/content
  3. Emit a `files` event:
        {"type": "files", "data": {"files": [{"type": "image", "url": ...}]}}

Tools now take __request__, __user__, __metadata__ params for the
upload (Open WebUI auto-injects these). Falls back to data-URI
message event if the runtime imports aren't available (e.g. running
the file standalone for tests). The internal upload bypasses
get_verified_user via the user= kwarg, so no token plumbing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 13:21:48 -05:00
4d996e1205 smart_image_gen v0.4: emit image to chat, return only confirmation
The data URI returned from the tool was being given to the LLM as the
tool result — the LLM then either echoed the base64 to the user as
plain text (screenshot 1) or hallucinated a description of what it
thought the image looked like (screenshot 2 — "an image of a cat
sitting on a windowsill" for a fox-warrior prompt).

Fix: push the markdown image into the chat directly via
__event_emitter__ as a "message" event, and return a short text
confirmation as the function value. The confirmation is worded to
prevent the LLM from describing the image or repeating the markdown
(both common failure modes for tool-using LLMs).

Both generate_image and edit_image fixed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 13:14:59 -05:00
6adf133558 Ship Image Studio as importable JSON in addition to markdown walkthrough
Open WebUI accepts a JSON file at Workspace → Models → Import that
seeds a new model preset in one click instead of the manual table-
driven setup. The new image_studio.json mirrors the Open WebUI bulk-
export schema (array wrapper around the model object with id, name,
base_model_id, params, meta) and pre-fills system prompt, native
function calling, temperature 0.5, top_p 0.9, smart_image_gen tool
attachment, suggestion prompts.

The markdown walkthrough stays as the source of truth for the system
prompt content and as the fallback when import fails (e.g. tool ID
mismatch, unfamiliar field, schema drift across Open WebUI versions).
README points at both paths.

Caveat doc'd in the markdown: if the imported preset doesn't actually
have smart_image_gen attached, the tool ID in the JSON didn't match
what Open WebUI assigned — re-attach manually in the model edit
screen.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 13:04:49 -05:00
d4e2058859 smart_image_gen v0.3: add edit_image (img2img) method
The Tool now exposes two methods the LLM picks between based on whether
the user attached an image:

  generate_image — txt2img (existing, unchanged behavior)
  edit_image     — img2img on the most recently attached image

edit_image extracts the source image from __messages__ (base64 data
URIs in image_url content blocks) or __files__ (local path or URL),
uploads to ComfyUI's /upload/image, runs an img2img workflow at the
caller-specified denoise (default 0.7), and returns the edited result.
Same per-style routing / sampler / CFG / prefix logic as generation.

Refactored the submit-and-poll loop into _submit_and_fetch shared by
both methods. Image extraction is defensive — tries messages first,
then files (path then URL), returns a clear "no image attached"
message rather than silently generating from scratch.

Image Studio system prompt rewritten to teach the LLM when to call
edit_image vs generate_image and how to pick denoise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:59:13 -05:00
41d571d8d1 Add Image Studio model preset — forces smart_image_gen tool use
A documented Open WebUI custom-model preset wrapping mistral-nemo:12b
with: aggressive system prompt that mandates calling generate_image,
only the smart_image_gen tool attached, native function calling,
lower temperature for tool-call reliability. Users pick "Image Studio"
from the chat-model dropdown when they want images.

Solves the common case where general-purpose chat models describe an
image in text instead of firing the tool — usually on conversational
phrasings like "can you draw me…". The preset removes the ambiguity
by giving the LLM exactly one job and one tool.

Setup walkthrough in openwebui-models/image_studio.md; deployment
README §9 points users at it as the recommended path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:54:13 -05:00
9e22de0328 smart_image_gen: tighten docstring + Literal style enum
Two changes to make the LLM more likely to call the tool:

1. Lead the docstring with an unambiguous directive — "Create an image
   and show it to the user. Use this whenever the user asks you to
   draw, generate, ..." plus a hard "do not say you cannot generate
   images" line. Open WebUI feeds the docstring straight to the LLM as
   the tool description; first line carries the most weight.

2. `style: Optional[StyleName]` where StyleName is a Literal enum of
   the seven values. Native function-calling models read the type
   annotation and present the seven valid values to the LLM as a
   strict choice instead of a free-text param.

If the LLM still doesn't fire the tool, the install is probably wrong:
Workspace → Models → the model → Advanced Params → Function Calling
must be set to Native (not Default).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:52:26 -05:00
b815cd6a5f Tune static workflows to CyberRealisticXL recommended settings
The static workflow JSONs default to CyberRealisticXLPlay (set in an
earlier commit), but the KSampler still had euler/normal/CFG7/20 — the
generic settings I scaffolded with. Updated to the creator-published
defaults: dpmpp_2m_sde / karras / CFG 4 / 28 steps. CLIP skip 1
already correct (no node needed; default behavior).

Added a section to the deployment README spelling out the trade-off:
static workflows are locked to one checkpoint family at a time because
Open WebUI's nodes mapping doesn't expose sampler/CFG/scheduler/CLIP
skip/prefix. For multi-checkpoint use, the smart_image_gen Tool path is
the only one that gets these right per-prompt.

Re-paste workflows into Open WebUI Settings → Images to pick up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:47:46 -05:00
45d5541be0 smart_image_gen v0.2: per-style sampler/CFG/steps/CLIP-skip + prompt prefixes
Researched each of the seven SDXL checkpoints on Civitai and encoded the
creator-recommended generation defaults per style instead of one global
set. Material differences:

  - photo (CyberRealistic): dpmpp_2m_sde / karras / CFG 4 / 28 steps / CLIP 1
  - juggernaut: dpmpp_2m_sde / karras / CFG 4.5 / 35 steps / CLIP 1
  - pony: euler_a / normal / CFG 7.5 / 25 steps / CLIP 2
  - general (Talmendo): dpmpp_2m / karras / CFG 8 / 30 steps / CLIP 2
  - furry-nai (Reed): euler_a / normal / CFG 5 / 30 steps / CLIP 2
  - furry-noob (IndigoVoid): euler_a-only / normal / CFG 4.5 / 20 / CLIP 2
  - furry-il (NovaFurry): euler_a / normal / CFG 4 / 30 steps / CLIP 2

Three prompt-prefix dialects auto-prepended (NEVER cross-contaminated):
photoreal models get nothing, Pony gets the full
score_9..score_4_up chain (mandatory), and the NoobAI/Illustrious
furry models get their booru quality + year-tag prefixes
(masterpiece/best quality/absurdres/newest/etc). Workflow now includes
a CLIPSetLastLayer node so per-style CLIP skip works.

Routing default for generic "furry" flipped from Reed (NAI) to NovaFurry
(Illustrious) — current sweet-spot consensus. Removed global
DEFAULT_STEPS/DEFAULT_CFG valves; per-style values are canonical.

Sources: each model's Civitai page (CyberRealisticXL, Juggernaut,
Pony V6 XL, TalmendoXL, Reed FurryMix, IndigoVoid FurryFused,
NovaFurryXL) and Pony/Illustrious prompting guides.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:45:34 -05:00
cd0034cd99 Flesh out per-style negatives in smart_image_gen Tool
Each style now gets a proper baseline covering quality, anatomy, and
watermark/signature suppression — plus the appropriate style-leak guards
(no-cartoon for photo, no-human for furry, score_4–6 suppression for
pony). Quality terms only; no NSFW filtering by default since several
checkpoints in this set are commonly used for adult work and would
fight a baked-in content filter. If SFW-by-default is wanted, add an
explicit safe-mode flag rather than expanding NEGATIVES.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:39:24 -05:00
c585e53ed4 Bake quality-focused default negatives into the static workflows
Open WebUI overwrites node 7's text when the request supplies a
negative_prompt, so the default only takes effect when one isn't
provided — which is the common case for the image-button path since the
chat UI doesn't expose the field. Generic quality terms only (no style
or content restrictions) so the default is safe across SD/SDXL/Flux
swaps and doesn't fight whichever checkpoint is loaded.

The smart_image_gen Tool already had per-style defaults; this only
affects the non-Tool image-gen path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:36:44 -05:00
392b26167f Add smart_image_gen Tool for per-prompt checkpoint routing
Open WebUI Tool the LLM invokes instead of the built-in image action.
Auto-routes among the seven SDXL checkpoints (photo / juggernaut /
pony / general / furry-{nai,noob,il}) based on either an explicit
`style` arg or first-match-wins regex over the prompt. Constructs the
ComfyUI workflow inline, submits via /prompt, polls /history, returns
the result as a base64 data-URI markdown image so no extra hosting is
needed. Per-style default negatives. ComfyUI URL / steps / CFG /
timeout are admin-tunable Valves.

Filters can't see image-gen requests in Open WebUI (the routers skip
the filter chain), so the LLM-driven Tool is the only path that
gives intent-aware routing without changing the chat UX.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:17:02 -05:00
704bcfdf13 Default workflows to SDXL CyberRealistic; ship empty model preseed
Drops the SD 1.5 placeholder. The shipped txt2img/img2img workflows now
reference CyberRealisticXLPlay_V8.0_FP16.safetensors (the checkpoint
figment used in production), and comfyui-init-models.sh ships with no
active fetches — operators uncomment examples or add their own URLs.

The script + workflow filenames have to line up; README explains.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:02:26 -05:00
0ad99b6199 Add comfyui-model-init sidecar for ComfyUI model preseeding
Mirrors the Ollama model-init pattern: a one-shot Alpine container that
mounts the comfyui-models volume and runs comfyui-init-models.sh, which
curls direct download URLs (HuggingFace by default) into the right
subdirectories. Idempotent — already-present files are skipped.

HF_TOKEN is plumbed through for gated repos (Flux-dev, SD3, etc.) and is
opt-in via .env. The default list ships SD 1.5 only, matching the
placeholder filename in workflows/*.json. Examples for SDXL, Flux, and
upscalers are commented in the script.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 11:57:24 -05:00
14 changed files with 2469 additions and 58 deletions

View File

@@ -52,8 +52,30 @@ RUN git clone --depth 1 https://github.com/ltdrdata/ComfyUI-Manager.git \
${COMFYUI_HOME}/custom_nodes/ComfyUI-Manager && \
pip install -r ${COMFYUI_HOME}/custom_nodes/ComfyUI-Manager/requirements.txt
# comfyui_segment_anything — GroundingDINO + SAM-HQ in one bundle. Required
# by the smart_image_gen Tool's text-targeted inpainting (edit_image with the
# mask_text parameter). Model weights auto-download on first use into
# /opt/comfyui/models/{sams,grounding-dino}/ — first inpaint takes ~3 GB of
# downloads, subsequent runs are instant.
#
# Transformers must stay <5: GroundingDINO inside this node calls
# BertModel.get_head_mask, which transformers 5.0 silently removed. The pin
# is applied AFTER the requirements install so it overrides anything the
# upstream requirements.txt would have pulled.
RUN git clone --depth 1 https://github.com/storyicon/comfyui_segment_anything.git \
${COMFYUI_HOME}/custom_nodes/comfyui_segment_anything && \
pip install -q -r ${COMFYUI_HOME}/custom_nodes/comfyui_segment_anything/requirements.txt && \
pip install -q "transformers>=4.40,<5"
# Entrypoint wrapper — auto-installs requirements.txt for any custom_node
# present at startup (covers Manager-installed nodes and nodes cloned
# directly into the comfyui-custom-nodes volume).
COPY install-custom-node-deps.sh /usr/local/bin/install-custom-node-deps.sh
RUN chmod +x /usr/local/bin/install-custom-node-deps.sh
EXPOSE 8188
# --listen 0.0.0.0 binds to every interface so the Open WebUI container on the
# shared compose network can reach it. --port is explicit for clarity.
ENTRYPOINT ["/usr/local/bin/install-custom-node-deps.sh"]
CMD ["python", "main.py", "--listen", "0.0.0.0", "--port", "8188"]

View File

@@ -19,6 +19,36 @@ WEBUI_SECRET_KEY=replace-with-32-byte-hex
# Only needed if you uncomment the anubis-owui service in docker-compose.yml.
ANUBIS_OWUI_KEY=replace-with-32-byte-hex
# ComfyUI image tag to deploy. `latest` tracks whatever the release workflow
# last pushed; pin to a v* tag (e.g. 0.1.0) for reproducible deploys.
COMFYUI_IMAGE_TAG=latest
# ─── Image tags ─────────────────────────────────────────────────────────────
# Pin to specific versions for reproducible deploys. The defaults below are
# the last set verified to work end-to-end for this stack — change only when
# you've tested a newer combination. `latest` / `main` is fine for local
# experimentation but means deploys are non-deterministic.
#
# Find current tags at:
# ComfyUI git.anomalous.dev/alphacentri/comfyui-nvidia/-/tags
# Caddy https://hub.docker.com/_/caddy/tags
# Ollama https://hub.docker.com/r/ollama/ollama/tags
# Open WebUI https://github.com/open-webui/open-webui/pkgs/container/open-webui
# Alpine https://hub.docker.com/_/alpine/tags
# Anubis https://github.com/TecharoHQ/anubis/pkgs/container/anubis
COMFYUI_IMAGE_TAG=0.2.1
CADDY_TAG=2-alpine
OLLAMA_TAG=latest
OPEN_WEBUI_TAG=main
ALPINE_TAG=3.20
ANUBIS_TAG=latest
# HuggingFace access token. Only needed if comfyui-init-models.sh references
# gated repos (Flux-dev, SD3, etc.). Generate a read token at
# https://huggingface.co/settings/tokens. Leave empty for public-only.
HF_TOKEN=
# HTTPS base URL of an S3 bucket / CDN that hosts mirrored Ollama model
# tarballs (created by mirror-ollama-model.sh). Files under this base are
# fetched by init-models.sh's s3_pull instead of registry.ollama.ai —
# faster and immune to upstream rate-limiting / removal. Example:
# S3_OLLAMA_BASE=https://your-bucket.s3.amazonaws.com/ollama-models
# Leave empty to fall back to plain `ollama pull` for everything.
S3_OLLAMA_BASE=

View File

@@ -10,12 +10,17 @@ production `srvno.de` deployment.
## Files
| File | Purpose |
| ------------------- | -------------------------------------------------------- |
| `docker-compose.yml`| Service definitions, volumes, GPU reservations |
| `Caddyfile` | TLS + reverse proxy config (one site block per hostname) |
| `init-models.sh` | Models to preseed into Ollama on first boot |
| `.env.example` | Secrets and image-tag pins. Copy to `.env` |
| File | Purpose |
| --------------------------------------- | -------------------------------------------------------- |
| `docker-compose.yml` | Service definitions, volumes, GPU reservations |
| `Caddyfile` | TLS + reverse proxy config (one site block per hostname) |
| `init-models.sh` | LLMs to preseed into Ollama on first boot |
| `mirror-ollama-model.sh` | Helper — mirror an Ollama model into a tarball you can host on S3 |
| `comfyui-init-models.sh` | Checkpoints/VAEs/LoRAs to preseed into ComfyUI on first boot |
| `openwebui-tools/smart_image_gen.py` | Tool that auto-routes image generation, img2img, and text-targeted inpainting to the right SDXL checkpoint |
| `openwebui-models/image_studio.md` | Dedicated chat-model preset — manual setup walkthrough |
| `openwebui-models/image_studio.json` | The same preset as an importable Open WebUI model JSON |
| `.env.example` | Secrets and image-tag pins. Copy to `.env` |
## 1. Host prerequisites
@@ -60,7 +65,20 @@ Then edit:
```
- **`init-models.sh`** — keep the LLMs you want preseeded, drop the rest.
Check sizes at <https://ollama.com/library> first; the host needs disk
for everything listed.
for everything listed. Two pull paths are available:
- `pull "<model:tag>"` — standard registry pull from
`registry.ollama.ai`.
- `s3_pull "<model:tag>" "<archive.tgz>"` — fetches from your own
mirror set via `S3_OLLAMA_BASE` in `.env`. Falls back to
`ollama pull` if the env var isn't set, so this is safe to enable
incrementally. Create the tarballs once with
`mirror-ollama-model.sh` (see [Mirroring models to S3](#mirroring-models-to-s3)).
- **`comfyui-init-models.sh`** — checkpoints/VAEs/LoRAs to preseed into
ComfyUI. Ships empty (no active fetches) — uncomment the SDXL/Flux/
upscaler examples or add your own. Whatever filename you pick should
match the `ckpt_name` field in `workflows/*.json` (default expects
`CyberRealisticXLPlay_V8.0_FP16.safetensors`). Set `HF_TOKEN` in
`.env` if any are gated repos.
## 3. Bring it up
@@ -80,22 +98,29 @@ docker compose exec comfyui curl -sf http://127.0.0.1:8188/system_stats | head -
docker compose exec open-webui curl -sf http://127.0.0.1:8080/health
```
## 4. Drop in at least one ComfyUI checkpoint
## 4. ComfyUI checkpoints
ComfyUI ships no models. The shipped workflow templates reference
`v1-5-pruned-emaonly.safetensors` as a placeholder; drop any
SD/SDXL/Flux checkpoint into the `comfyui-models` volume under
`checkpoints/`:
ComfyUI ships no models. Three ways to get one in:
```sh
docker run --rm -v ai-stack_comfyui-models:/models -w /models/checkpoints \
curlimages/curl:latest -L -O \
https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors
```
1. **Preseed via the sidecar (default).** `comfyui-model-init` runs once
on `compose up`, downloads everything `comfyui-init-models.sh` lists,
and exits. The script ships empty — uncomment one of the examples or
add your own `fetch` calls (SDXL, Flux, LoRAs, upscalers, etc.). At
least one checkpoint should be named
`CyberRealisticXLPlay_V8.0_FP16.safetensors` to match the workflow
default, or update `ckpt_name` in `workflows/*.json` to whatever you
pull. Re-run with `docker compose up -d comfyui-model-init` after
script edits; already-present files are skipped.
2. **ComfyUI-Manager UI.** Open `https://comfyui.example.com` (after
basic-auth login), click **Manager**, then **Model Manager**, install
from the catalogue.
3. **Direct copy into the volume.** Useful if you already have the file
locally:
Or open the ComfyUI native UI at `https://comfyui.example.com` (after
basic-auth login), use the **Manager** button (added by ComfyUI-Manager),
and install one through **Model Manager**.
```sh
docker run --rm -v ai-stack_comfyui-models:/models -v $PWD:/src alpine \
cp /src/your-model.safetensors /models/checkpoints/
```
## 5. First-user signup in Open WebUI
@@ -117,7 +142,7 @@ In Open WebUI: **Admin Panel -> Settings -> Images**.
4. **ComfyUI Workflow Nodes** -> paste the contents of
[`../../workflows/txt2img.nodes.json`](../../workflows/txt2img.nodes.json).
5. **Default Model** -> the filename of the checkpoint you dropped in
step 4 (e.g. `v1-5-pruned-emaonly.safetensors`).
step 4 (e.g. `CyberRealisticXLPlay_V8.0_FP16.safetensors`).
6. Save.
For image editing (img2img), scroll to the **Image Editing** section in
@@ -132,6 +157,140 @@ Open WebUI submits the workflow to ComfyUI; the result drops back into
the chat when KSampler finishes. To test img2img, attach an image and
use the edit action.
## 8. (Optional) Install the smart-routing Tool
The image-button path always uses the admin's **Default Model**. To get
per-prompt checkpoint routing — e.g. "draw me a cyberpunk city" picks
CyberRealistic, "anthro fox warrior" picks one of the furry checkpoints —
install the `smart_image_gen.py` Tool. It exposes two methods the LLM
calls:
- **`generate_image`** for new images from scratch (txt2img).
- **`edit_image`** for modifying an image the user attached to the
chat. Two modes:
- With `mask_text` — text-targeted inpainting via GroundingDINO+SAM
(e.g. "the dog's collar"). Only the named region is repainted.
- Without `mask_text` — full img2img which reimagines the whole
image at the requested denoise.
Both auto-route to the right SDXL checkpoint per request.
> **First inpaint takes a few minutes**: SAM-HQ (~2.5 GB) and
> GroundingDINO (~700 MB) auto-download into the `comfyui-models`
> volume on the very first call to `edit_image` with `mask_text`.
> Subsequent inpaints are instant.
1. **Workspace -> Tools -> +** (top-right).
2. Paste the contents of
[`openwebui-tools/smart_image_gen.py`](openwebui-tools/smart_image_gen.py).
3. Save. Optionally adjust the Valves (ComfyUI URL, default steps, CFG,
timeout) via the gear icon.
4. **Workspace -> Models** (or pick an existing chat model) -> edit ->
under **Tools**, enable `smart_image_gen` -> save.
5. Make sure the model has **native function calling** enabled
(Workspace -> Models -> the model -> Advanced Params -> Function
Calling: Native). Mistral, Qwen, and Llama 3.1+ all support this.
In a chat with that model, ask for an image — "make me a photoreal
portrait of a cyberpunk samurai" — the LLM should call
`generate_image(prompt=..., style="photo")`. The status bar shows
"Routing to photo (CyberRealisticXLPlay…)" while it generates.
If the LLM responds in text instead of calling the tool, install the
**Image Studio** chat-model preset (next section) — a dedicated model
with a system prompt that removes the ambiguity.
## 9. (Recommended) Install the Image Studio model preset
General-purpose chat models often "describe" an image in text instead
of firing the `generate_image` tool, especially on conversational
phrasing ("can you draw me…", "I'd love a picture of…"). The
**Image Studio** preset wraps `mistral-nemo:12b` in a system prompt
that mandates tool use — every message is treated as an image request.
Setup — two paths:
- **Import the JSON** (fast): Workspace → Models → Import →
[`openwebui-models/image_studio.json`](openwebui-models/image_studio.json).
- **Manual** (full control): walkthrough in
[`openwebui-models/image_studio.md`](openwebui-models/image_studio.md).
Users then pick **Image Studio** from the chat-model dropdown when
they want to generate or edit images.
**One required follow-up** after either install path: set a separate
**Task Model** in Admin Settings → Interface → Task Model. Image
Studio uses `tool_choice: required` to force tool calls, which means
the same model can't produce the text responses Open WebUI needs for
chat-title generation, tag suggestions, and autocomplete. Pick any
non-Image-Studio model you have pulled (`mistral-nemo:12b`,
`llama3.1:8b`, etc.) — see the
[**Set a separate Task Model** section in image_studio.md](openwebui-models/image_studio.md#set-a-separate-task-model-required-after-install).
The preset ships with `vision: true` so users can attach images for
editing even though `mistral-nemo:12b` isn't a vision model — see the
[**Vision capability** section in image_studio.md](openwebui-models/image_studio.md#vision-capability)
for the trade-offs and the upgrade path to a real vision LLM
(`qwen2.5vl:7b`, `llama3.2-vision:11b`, etc.) if the LLM needs to
actually see the image to write smarter edit instructions.
To extend (new checkpoint, new style):
- Add the filename to `comfyui-init-models.sh` so it gets pulled.
- Add a key to the `CHECKPOINTS` dict in `smart_image_gen.py`.
- Optionally add style-specific negatives to `NEGATIVES`.
- Optionally add keyword routing rules to `ROUTING_RULES` for the
auto-detect path.
- Re-paste the Tool source in Workspace -> Tools.
## Mirroring models to S3
For models you want to pin against upstream changes (or pull faster
from your own infra), mirror them to S3 once and have the
deployment fetch from there.
### Create the mirror tarball
Run [`mirror-ollama-model.sh`](mirror-ollama-model.sh) on any machine
that has the model pulled locally. It reads `~/.ollama/models/`,
pulls the manifest's referenced blobs, and tars everything together:
```sh
./mirror-ollama-model.sh huihui_ai/qwen3.5-abliterated:9b qwen3.5-abliterated-9b.tgz
```
### Upload to S3
Whatever fits — `aws s3 cp`, `mc`, `rclone`, etc. The bucket needs
to expose the file over HTTPS (public-read ACL on the object, a
CloudFront distribution, R2 with public URLs, etc.):
```sh
aws s3 cp qwen3.5-abliterated-9b.tgz s3://your-bucket/ollama-models/ --acl public-read
```
### Wire the deployment to fetch from there
In `.env`:
```
S3_OLLAMA_BASE=https://your-bucket.s3.amazonaws.com/ollama-models
```
In `init-models.sh`, switch the affected models from `pull` to
`s3_pull`:
```sh
s3_pull "huihui_ai/qwen3.5-abliterated:9b" "qwen3.5-abliterated-9b.tgz"
```
`docker compose up -d model-init` re-runs the init container; the
script downloads the tarball, extracts into the `ollama-data` volume,
and the running Ollama daemon picks it up on its next manifest scan.
If `S3_OLLAMA_BASE` isn't set, `s3_pull` transparently falls back to
`ollama pull` — safe to commit `s3_pull` lines without S3 ready yet.
## Enabling Anubis (later)
The `anubis-owui` service is defined in compose but no Caddy site block
@@ -154,11 +313,31 @@ provides a prompt, image, seed, etc. Each entry:
Recognised `type` strings (per Open WebUI source): `model`, `prompt`,
`negative_prompt`, `width`, `height`, `n` (batch size), `steps`, `seed`,
and `image` (img2img / edit only).
and `image` (img2img / edit only). Notably **not** mappable: sampler,
scheduler, CFG, CLIP skip, prompt prefix.
If you swap in a fancier workflow (SDXL, Flux, ControlNet, custom
samplers, NL masking via SAM nodes, etc.), update the matching
`*.nodes.json` so the node IDs and input keys still line up.
This means the static workflow JSONs are tuned for a single checkpoint
family at a time. The shipped defaults match
`CyberRealisticXLPlay_V8.0_FP16.safetensors`
(`dpmpp_2m_sde` / `karras` / CFG 4 / 28 steps / CLIP skip 1 / no prefix).
**If you change the admin's Default Model to a different checkpoint
family** (Pony, NoobAI, Illustrious, etc.), edit the workflow JSONs:
- `KSampler` node: change `sampler_name`, `scheduler`, `cfg`, `steps`
- For checkpoints needing CLIP skip 2: add a `CLIPSetLastLayer` node and
rewire `CLIPTextEncode` nodes through it (see
[openwebui-tools/smart_image_gen.py](openwebui-tools/smart_image_gen.py)
for the exact graph).
- For Pony or NoobAI/Illustrious: the required quality-tag prefix
(`score_9, score_8_up, ...` or `masterpiece, best quality, ...`) has
to be typed by the user every time, since the workflow can't inject
it. **For multi-checkpoint deployments, use the smart_image_gen Tool
instead** — it handles per-checkpoint sampler / CFG / steps / CLIP
skip / prefix automatically based on the LLM's `style` choice.
If you swap in a fancier workflow (Flux, ControlNet, NL masking via
SAM nodes, etc.), update the matching `*.nodes.json` so the node IDs
and input keys still line up.
## Common gotchas

View File

@@ -0,0 +1,84 @@
#!/bin/sh
# Preseed ComfyUI's models volume with checkpoints, VAEs, LoRAs, etc.
# Runs once via the comfyui-model-init service (see docker-compose.yml).
# Safe to re-run — already-present files are skipped.
#
# ComfyUI doesn't have a "pull" command of its own, so this is plain curl
# against direct download URLs. For HuggingFace, the direct URL is:
# https://huggingface.co/<repo>/resolve/main/<file>
# For gated HF repos (Flux-dev, SD3, etc.), set HF_TOKEN in .env — the
# script attaches it as a bearer token automatically.
set -e
apk add --no-cache curl >/dev/null
mkdir -p /models/checkpoints /models/vae /models/loras /models/controlnet \
/models/clip /models/clip_vision /models/upscale_models /models/embeddings \
/models/sams /models/grounding-dino
fetch() {
dest="$1"; name="$2"; url="$3"
target="/models/$dest/$name"
if [ -f "$target" ]; then
echo "$dest/$name already present"
return
fi
echo "→ Downloading $dest/$name"
mkdir -p "/models/$dest"
if [ -n "$HF_TOKEN" ] && echo "$url" | grep -q huggingface.co; then
curl -fL -C - --retry 3 -H "Authorization: Bearer $HF_TOKEN" \
-o "$target.partial" "$url"
else
curl -fL -C - --retry 3 -o "$target.partial" "$url"
fi
mv "$target.partial" "$target"
}
# ─── Edit the list below to choose what gets preseeded ──────────────────────
# Format: fetch <subdir under /models> <filename to save as> <direct URL>
#
# No checkpoints are downloaded by default — the deployment ships expecting
# you to point at your own model mirror or the public examples below.
# Whatever filename you pick should match the `ckpt_name` field in
# workflows/txt2img.json and workflows/img2img.json (the shipped default
# is CyberRealisticXLPlay_V8.0_FP16.safetensors); update either the
# script or the workflows so they line up.
# Examples — uncomment what you want.
# SDXL Base 1.0 (~6.9 GB)
# fetch checkpoints sd_xl_base_1.0.safetensors \
# https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors
# SDXL VAE (fixes washed-out colours on some SDXL checkpoints)
# fetch vae sdxl_vae.safetensors \
# https://huggingface.co/stabilityai/sdxl-vae/resolve/main/sdxl_vae.safetensors
# Flux.1-dev (~23 GB, gated — needs HF_TOKEN with access to black-forest-labs)
# fetch checkpoints flux1-dev.safetensors \
# https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev.safetensors
# 4x-UltraSharp upscaler
# fetch upscale_models 4x-UltraSharp.pth \
# https://huggingface.co/lokCX/4x-Ultrasharp/resolve/main/4x-UltraSharp.pth
# ─── Inpainting models (SAM-HQ + GroundingDINO) ─────────────────────────────
# Required by the smart_image_gen Tool's edit_image with mask_text. ComfyUI
# would auto-download these on first use, but that takes minutes and tends
# to time out in-flight tool calls — preseeding here makes the first inpaint
# instant.
fetch sams sam_hq_vit_h.pth \
https://huggingface.co/lkeab/hq-sam/resolve/main/sam_hq_vit_h.pth
fetch grounding-dino groundingdino_swint_ogc.pth \
https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/groundingdino_swint_ogc.pth
fetch grounding-dino GroundingDINO_SwinT_OGC.cfg.py \
https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/GroundingDINO_SwinT_OGC.cfg.py
echo "Done."

View File

@@ -24,7 +24,7 @@ services:
# Encrypt), reverse-proxies to the in-compose services by name.
# ---------------------------------------------------------------------------
caddy:
image: caddy:2-alpine
image: caddy:${CADDY_TAG:-2-alpine}
container_name: caddy
restart: unless-stopped
ports:
@@ -49,7 +49,7 @@ services:
# Ollama — LLM daemon, GPU-backed.
# ---------------------------------------------------------------------------
ollama:
image: ollama/ollama:latest
image: ollama/ollama:${OLLAMA_TAG:-latest}
container_name: ollama
restart: unless-stopped
# 11434 only published if you want direct access from the VM host.
@@ -60,8 +60,12 @@ services:
- ollama-data:/root/.ollama
environment:
- OLLAMA_HOST=0.0.0.0:11434
- OLLAMA_KEEP_ALIVE=30m
- OLLAMA_MAX_LOADED_MODELS=2
# KEEP_ALIVE=-1 holds loaded models in VRAM until evicted by another
# load (vs the default 5m / our previous 30m which forces a reload
# penalty on every cold use). Pair with MAX_LOADED_MODELS sized to
# whatever fits in your GPU's VRAM — see README "VRAM sizing".
- OLLAMA_KEEP_ALIVE=-1
- OLLAMA_MAX_LOADED_MODELS=3
- OLLAMA_FLASH_ATTENTION=1
deploy:
resources:
@@ -79,8 +83,12 @@ services:
# One-shot model puller. Runs after ollama is healthy, pulls whatever
# init-models.sh lists, exits. `restart: "no"` keeps it from looping.
#
# Models can come from registry.ollama.ai (default) or your own S3
# mirror (set S3_OLLAMA_BASE in .env; create tarballs with
# mirror-ollama-model.sh).
model-init:
image: ollama/ollama:latest
image: ollama/ollama:${OLLAMA_TAG:-latest}
container_name: ollama-model-init
depends_on:
ollama:
@@ -90,6 +98,7 @@ services:
- ./init-models.sh:/init-models.sh:ro
environment:
- OLLAMA_HOST=ollama:11434
- S3_OLLAMA_BASE=${S3_OLLAMA_BASE:-}
entrypoint: ["/bin/sh", "/init-models.sh"]
restart: "no"
@@ -103,7 +112,7 @@ services:
# (install via ComfyUI-Manager) instead of as a separate sidecar.
# ---------------------------------------------------------------------------
comfyui:
image: git.anomalous.dev/alphacentri/comfyui-nvidia:${COMFYUI_IMAGE_TAG:-latest}
image: git.anomalous.dev/alphacentri/comfyui-nvidia:${COMFYUI_IMAGE_TAG:-0.2.1}
pull_policy: always
container_name: comfyui
restart: unless-stopped
@@ -129,11 +138,28 @@ services:
retries: 5
start_period: 120s
# One-shot model puller for ComfyUI. Mounts the same models volume,
# downloads whatever comfyui-init-models.sh lists, exits. ComfyUI doesn't
# need to be running for this — files just land on the volume; ComfyUI
# picks them up next time it scans (or on a restart).
comfyui-model-init:
image: alpine:${ALPINE_TAG:-3.20}
container_name: comfyui-model-init
volumes:
- comfyui-models:/models
- ./comfyui-init-models.sh:/init.sh:ro
environment:
# Optional — set in .env to download from gated HuggingFace repos
# (Flux-dev, SD3, etc.). Leave empty for public-only.
HF_TOKEN: "${HF_TOKEN:-}"
entrypoint: ["/bin/sh", "/init.sh"]
restart: "no"
# ---------------------------------------------------------------------------
# Open WebUI — multi-user chat.
# ---------------------------------------------------------------------------
open-webui:
image: ghcr.io/open-webui/open-webui:main
image: ghcr.io/open-webui/open-webui:${OPEN_WEBUI_TAG:-main}
container_name: open-webui
restart: unless-stopped
# ports: not published; Caddy fronts it
@@ -175,7 +201,7 @@ services:
# `open-webui:8080` → `anubis-owui:8923`.
# ---------------------------------------------------------------------------
anubis-owui:
image: ghcr.io/techarohq/anubis:latest
image: ghcr.io/techarohq/anubis:${ANUBIS_TAG:-latest}
container_name: anubis-owui
restart: unless-stopped
environment:

View File

@@ -3,20 +3,68 @@
# Runs once via the model-init service (see docker-compose.yml). Safe to
# re-run — already-present models are skipped.
#
# Add or remove tags to taste. The host needs enough disk for everything
# listed; check sizes at https://ollama.com/library before adding.
# Two pull paths:
# - s3_pull — fetches a tarball from $S3_OLLAMA_BASE (your own mirror,
# created by mirror-ollama-model.sh) and extracts into
# Ollama's data dir. Faster + immune to upstream changes.
# Falls back to ollama pull if S3_OLLAMA_BASE is unset.
# - pull — standard `ollama pull` against registry.ollama.ai.
set -e
MODELS="dolphin3:8b llama3.1:8b ministral-3:8b mistral-nemo:12b qwen3.6:latest"
# Make sure curl is available — ollama/ollama:latest doesn't always include
# it, and s3_pull needs it. tar is in the base image.
if ! command -v curl >/dev/null 2>&1; then
apt-get update -qq && apt-get install -y -qq curl ca-certificates >/dev/null
fi
for model in $MODELS; do
if ollama list | awk 'NR>1 {print $1}' | grep -qx "$model"; then
echo "$model already present"
else
echo "→ Pulling $model"
ollama pull "$model"
fi
S3_OLLAMA_BASE="${S3_OLLAMA_BASE:-}"
OLLAMA_DATA="/root/.ollama"
s3_pull() {
name="$1"; archive="$2"
if ollama list 2>/dev/null | awk 'NR>1 {print $1}' | grep -qx "$name"; then
echo "$name already present"
return
fi
if [ -z "$S3_OLLAMA_BASE" ]; then
echo "$name: S3_OLLAMA_BASE unset, falling back to ollama pull"
ollama pull "$name"
return
fi
url="${S3_OLLAMA_BASE%/}/$archive"
echo "→ Downloading $name from $url"
curl -fL -C - --retry 3 -o "/tmp/$archive" "$url"
tar -xzf "/tmp/$archive" -C "$OLLAMA_DATA/models/"
rm -f "/tmp/$archive"
echo "$name installed (mirror)"
}
pull() {
name="$1"
if ollama list 2>/dev/null | awk 'NR>1 {print $1}' | grep -qx "$name"; then
echo "$name already present"
else
echo "→ Pulling $name from registry.ollama.ai…"
ollama pull "$name"
fi
}
# ─── S3-mirrored models ─────────────────────────────────────────────────────
# These live in your own bucket. Create the tarballs once with
# mirror-ollama-model.sh, upload to S3, then list them here.
s3_pull "huihui_ai/qwen3.5-abliterated:9b" "qwen3.5-abliterated-9b.tgz"
# huihui_ai/qwen3-vl-abliterated — Qwen 3 VL base abliteration (different
# fine-tune lineage than Qwen 3.5, so its tool-call template stays intact).
# Used as the Image Studio dispatcher: vision-capable, calls tools
# reliably, and doesn't refuse to dispatch on NSFW edit prompts. Pulled
# from registry; no S3 mirror entry yet.
pull "huihui_ai/qwen3-vl-abliterated:8b"
# ─── Direct registry pulls ──────────────────────────────────────────────────
for model in dolphin3:8b llama3.1:8b ministral-3:8b mistral-nemo:12b qwen3.6:latest; do
pull "$model"
done
echo "Done."

View File

@@ -0,0 +1,66 @@
#!/bin/bash
# Mirror an Ollama model into a portable tarball you can upload to S3
# (or any HTTPS host) and re-fetch via init-models.sh's s3_pull.
#
# Run on any machine that already has the model pulled locally — the
# script reads ~/.ollama/models/, parses the manifest to find the
# referenced blobs, and tars them together.
#
# Usage: ./mirror-ollama-model.sh <model:tag> <output.tgz>
# Example: ./mirror-ollama-model.sh huihui_ai/qwen3.5-abliterated:9b qwen3.5-abliterated-9b.tgz
#
# Upload the tarball to S3, then add to init-models.sh:
# s3_pull "huihui_ai/qwen3.5-abliterated:9b" "qwen3.5-abliterated-9b.tgz"
# and set S3_OLLAMA_BASE in .env to your bucket's HTTPS base URL.
set -euo pipefail
MODEL="${1:?Usage: $0 <model:tag> <output.tgz>}"
OUT="${2:?Usage: $0 <model:tag> <output.tgz>}"
OLLAMA_HOME="${OLLAMA_HOME:-$HOME/.ollama}"
MODELS="$OLLAMA_HOME/models"
if ! ollama list | awk 'NR>1 {print $1}' | grep -qx "$MODEL"; then
echo "Model $MODEL not found locally; pulling first..."
ollama pull "$MODEL"
fi
# huihui_ai/qwen3.5-abliterated:9b → manifests/registry.ollama.ai/huihui_ai/qwen3.5-abliterated/9b
ns_and_name="${MODEL%:*}"
tag="${MODEL##*:}"
manifest_rel="manifests/registry.ollama.ai/$ns_and_name/$tag"
manifest_abs="$MODELS/$manifest_rel"
if [ ! -f "$manifest_abs" ]; then
echo "ERROR: manifest not found at $manifest_abs" >&2
exit 1
fi
# Pull every sha256:* digest out of the manifest JSON. Each maps to
# blobs/sha256-<hex>.
blob_files=""
for digest in $(grep -oE 'sha256:[a-f0-9]+' "$manifest_abs" | sort -u); do
blob_rel="blobs/${digest/:/-}"
if [ ! -f "$MODELS/$blob_rel" ]; then
echo "WARNING: missing blob $blob_rel — skipping" >&2
continue
fi
blob_files="$blob_files $blob_rel"
done
count=$(echo "$blob_files" | wc -w | tr -d ' ')
echo "Archiving manifest + $count blob(s)..."
tar -czf "$OUT" -C "$MODELS" "$manifest_rel" $blob_files
size=$(du -h "$OUT" | cut -f1)
echo "Done: $OUT ($size)"
echo
echo "Next:"
echo " 1. Upload to your bucket, e.g."
echo " aws s3 cp $OUT s3://YOUR-BUCKET/ollama-models/ --acl public-read"
echo " (or whatever exposes it over HTTPS)"
echo " 2. Set S3_OLLAMA_BASE in .env to the bucket's HTTPS base, e.g."
echo " S3_OLLAMA_BASE=https://YOUR-BUCKET.s3.amazonaws.com/ollama-models"
echo " 3. Add to init-models.sh:"
echo " s3_pull \"$MODEL\" \"$(basename "$OUT")\""

View File

@@ -0,0 +1,38 @@
[
{
"id": "image-studio",
"base_model_id": "huihui_ai/qwen3-vl-abliterated:8b",
"name": "Image Studio",
"params": {
"system": "/no_think\n\nYou are an image-tool dispatcher. You do not respond in prose. Every user message MUST result in exactly one tool call.\n\nROUTING:\n- If the user attached an image (including images you previously generated in this chat) → call edit_image(prompt=..., ...)\n- Otherwise → call generate_image(prompt=..., ...)\nBoth tools take `prompt` as the first argument — same name on both. Do NOT invent `edit_instruction`.\n\nFire the tool on the FIRST message, with no preamble. Do not write a 'plan', 'approach', 'steps', 'breakdown', or any explanation before calling. Do not ask clarifying questions. Do not say what you are about to do. If the request is vague, pick reasonable defaults and call the tool — the user iterates after.\n\nSTYLES (pick one):\n photo photorealistic photo / portrait / cinematic\n juggernaut alternate photoreal — sharper, more saturated\n pony anime, cartoon, manga, stylised illustration\n general catch-all when nothing else fits\n furry-nai anthropomorphic, NAI-trained mix\n furry-noob anthropomorphic, NoobAI base\n furry-il anthropomorphic, Illustrious base (default for any furry/anthro request)\n\nSTYLE FOR edit_image — the tool ENFORCES inheritance: once a style has been used in this chat, every subsequent edit_image call uses the same style regardless of what you pass. Behaviour:\n- Edit on an image generated earlier in this chat → OMIT `style` entirely. The tool will use the established style. Passing it is harmless but ignored.\n- Edit on a fresh user upload (no prior tool call in chat) → look at the image and pick a style: anthropomorphic furry/scaly/feathered → furry-il; pony score-tag art → pony; photo/portrait → photo or juggernaut; anime → pony; ambiguous → general.\n- Style cannot be changed mid-chat. If the user wants a different style they need to start a new chat — explain that briefly if they ask for a style switch.\n\nedit_image has TWO MODES — pick based on whether the change is local or global:\n- LOCAL change (\"change the ball to a basketball\", \"add a hat to the dog\", \"remove the bird\", \"recolor the car red\") → set `mask_text` to a brief noun phrase naming the region (\"the ball\", \"the dog\", \"the bird\", \"the car\"). Only that region is repainted; rest stays pixel-perfect.\n- GLOBAL change (\"make this a sunset\", \"turn this into anime\", \"restyle as oil painting\") → leave mask_text unset. The whole image is reimagined.\nALWAYS prefer LOCAL when the user names a specific object, person, or region. GLOBAL is only for whole-image style/lighting transformations.\n\nDenoise:\n- LOCAL (mask_text set): default 1.0. Drop to 0.60.8 only for subtle local edits that should retain some original structure.\n- GLOBAL (no mask_text): default 0.7. Use 0.30.5 for subtle restyle, 0.851.0 for radical reimagining.\n\nPick style for the DESIRED OUTPUT, not the input image.\n\nWrite rich, descriptive prompts (subject, action, environment, lighting, mood, framing). Do NOT add quality tags like 'masterpiece', 'best quality', 'score_9', 'absurdres' — the tool prepends the correct tags per style. Do NOT set sampler, CFG, steps, scheduler — the tool picks them.\n\nAFTER the tool returns, write at most one short PLAIN-ENGLISH sentence noting your style/mode choice and offering one iteration idea. The image is already shown to the user.\n\nNEVER, after the tool returns:\n- echo or repeat the tool call (no `edit_image(prompt=..., ...)`, no `<function=...>`, no JSON, no parameter listings)\n- describe what's in the image\n- list the arguments you used\n- enumerate styles, denoise, mask_text, etc.\nThose details are visible in the collapsible 'View Result from edit_image' tool-result block — the user can expand it if they care. Your follow-up message is for HUMAN conversation, not bookkeeping.",
"temperature": 0.5,
"top_p": 0.9,
"function_calling": "native",
"custom_params": {
"tool_choice": "required",
"enable_thinking": false
}
},
"meta": {
"profile_image_url": "/static/favicon.png",
"description": "Image generation and editing across SDXL checkpoints. Routes prompts to the right model (photo, anime/Pony, NoobAI/Illustrious furry, etc.) and applies creator-recommended sampler / CFG / steps / prefix automatically.",
"capabilities": {
"vision": true,
"usage": false,
"citations": false
},
"tags": [
{ "name": "image-gen" },
{ "name": "comfyui" }
],
"toolIds": ["smart_image_gen"],
"suggestion_prompts": [
{ "content": "Generate a photorealistic portrait of a cyberpunk samurai at dusk." },
{ "content": "Draw an anthropomorphic fox warrior in stylised anime art." },
{ "content": "Make a pony-style illustration of a starry forest at night." }
]
},
"access_control": null,
"is_active": true
}
]

View File

@@ -0,0 +1,254 @@
# Image Studio — dedicated image-generation chat model
A custom Open WebUI model preset that wraps a base LLM with a system
prompt heavily biased toward calling the `smart_image_gen` tool. Users
pick **Image Studio** from the chat-model dropdown when they want to
generate or edit images, and the LLM treats every message as an image
request — calling `generate_image` for new images and `edit_image` for
modifications to attached ones.
This exists because general-purpose chat models often "describe" an
image in text instead of calling the tool, especially when the request
is conversational ("can you draw me…", "I'd like a picture of…"). A
dedicated preset removes the ambiguity.
## Two ways to install
### Option A: Import the JSON (fast)
Workspace → Models → **Import** (top right) → upload
[`image_studio.json`](image_studio.json).
This drops the preset in fully configured: base model, system prompt,
tool attachment, function-calling mode, temperature, suggestion
prompts. Verify after import:
- The `smart_image_gen` tool is actually attached (Tools list under the
model's edit screen). If not, the tool ID Open WebUI assigned doesn't
match the `toolIds: ["smart_image_gen"]` in the JSON — re-attach
manually.
- Base Model is set to `mistral-nemo:12b`. Adjust if you want a
different LLM (Qwen3.6 or Llama 3.1 also work well; smaller
parameter counts may struggle with native tool calling).
### Option B: Create manually (table below)
**Workspace → Models → +** (top right).
| Field | Value |
| ----- | ----- |
| Name | `Image Studio` |
| Base Model | `huihui_ai/qwen3-vl-abliterated:8b` (Qwen 3 VL base, abliterated, vision + tools). Pull via `init-models.sh` first. The Qwen 3 VL fine-tune lineage isn't damaged by abliteration the way Qwen 3.5 is, so it both calls tools reliably AND won't refuse to dispatch on NSFW edit prompts. |
| Description | `Image generation and routing across SDXL checkpoints.` |
| System Prompt | Paste the block from [System prompt](#system-prompt) below. |
| Tools | enable **only** `smart_image_gen` |
In the **Advanced Params** section:
| Field | Value |
| ----- | ----- |
| Function Calling | `Native` — works cleanly on `huihui_ai/qwen3-vl-abliterated:8b` once thinking is disabled (see Custom Parameters). Native gives you the structured "View Result from edit_image" blocks and "Thought for X seconds" tracing in the UI. |
| Temperature | `0.5` (lower = more reliable tool-calling) |
| Top P | `0.9` |
| Context Length | leave default |
| Custom Parameters | `tool_choice: required` (forces the model to call a tool every turn) **and** `enable_thinking: false` (disables Qwen's thinking mode at the API level — the `/no_think` system-prompt directive isn't honored by abliterated Qwen builds, but this server-side flag is). Both required for reliable behaviour on `huihui_ai/qwen3-vl-abliterated:8b`. |
Save. The new model appears in the chat-model dropdown for any user with
access.
## System prompt
```
/no_think
You are an image-tool dispatcher. You do not respond in prose. Every
user message MUST result in exactly one tool call.
ROUTING:
- If the user attached an image (including images you previously
generated in this chat) → call edit_image(prompt=..., ...)
- Otherwise → call generate_image(prompt=..., ...)
Both tools take `prompt` as the first argument — same name on both.
Do NOT invent `edit_instruction`.
Fire the tool on the FIRST message, with no preamble. Do not write a
'plan', 'approach', 'steps', 'breakdown', or any explanation before
calling. Do not ask clarifying questions. Do not say what you are
about to do. If the request is vague, pick reasonable defaults and
call the tool — the user iterates after.
STYLES (pick one):
photo photorealistic photo / portrait / cinematic
juggernaut alternate photoreal — sharper, more saturated
pony anime, cartoon, manga, stylised illustration
general catch-all when nothing else fits
furry-nai anthropomorphic, NAI-trained mix
furry-noob anthropomorphic, NoobAI base
furry-il anthropomorphic, Illustrious base (default for any
furry/anthro request)
STYLE FOR edit_image — the tool ENFORCES inheritance: once a style
has been used in this chat, every subsequent edit_image call uses
the same style regardless of what you pass. Behaviour:
- Edit on an image generated earlier in this chat → OMIT `style`
entirely. The tool will use the established style. Passing it is
harmless but ignored.
- Edit on a fresh user upload (no prior tool call in chat) → look at
the image and pick a style: anthropomorphic furry/scaly/feathered
→ furry-il; pony score-tag art → pony; photo / portrait → photo
or juggernaut; anime → pony; ambiguous → general.
- Style cannot be changed mid-chat. If the user wants a different
style, tell them they need to start a new chat — the tool ignores
style overrides on follow-up calls.
edit_image has TWO MODES — pick based on whether the change is local
or global:
- LOCAL ("change the ball to a basketball", "add a hat to the dog",
"remove the bird", "recolor the car red") → set `mask_text` to a
brief noun phrase naming the region ("the ball", "the dog", "the
bird", "the car"). Only that region is repainted; rest stays
pixel-perfect.
- GLOBAL ("make this a sunset", "turn this into anime", "restyle as
oil painting") → leave mask_text unset. The whole image is
reimagined.
ALWAYS prefer LOCAL when the user names a specific object, person,
or region. GLOBAL is only for whole-image style/lighting
transformations.
Denoise:
- LOCAL (mask_text set): default 1.0. Drop to 0.60.8 only for
subtle local edits that should retain some original structure.
- GLOBAL (no mask_text): default 0.7. Use 0.30.5 for subtle
restyle, 0.851.0 for radical reimagining.
Pick style for the DESIRED OUTPUT, not the input image.
Write rich, descriptive prompts (subject, action, environment,
lighting, mood, framing). Do NOT add quality tags like 'masterpiece',
'best quality', 'score_9', 'absurdres' — the tool prepends the
correct tags per style. Do NOT set sampler, CFG, steps, scheduler —
the tool picks them.
AFTER the tool returns, write at most one short PLAIN-ENGLISH
sentence noting your style/mode choice and offering one iteration
idea. The image is already shown to the user.
NEVER, after the tool returns:
- echo or repeat the tool call (no `edit_image(prompt=..., ...)`,
no `<function=...>`, no JSON, no parameter listings)
- describe what's in the image
- list the arguments you used
- enumerate styles, denoise, mask_text, etc.
Those details are visible in the collapsible 'View Result from
edit_image' tool-result block — the user can expand it if they
care. Your follow-up message is for HUMAN conversation, not
bookkeeping.
```
The first line `/no_think` disables Qwen 3.x's reasoning phase. If
your base model isn't Qwen 3, leaving it in is a no-op (other models
ignore it). Drop it only if it actually causes problems.
## Set a separate Task Model (required after install)
`tool_choice: required` is what makes Image Studio reliably fire the
tool, but it has a side effect: Open WebUI uses the same model with
the same params for **title generation**, **tag generation**, and
**autocomplete**. With every response forced to be a tool call, those
text-only background tasks can't produce text, so chats stay named
"New Chat" forever and tag suggestions go silent.
Fix: point Open WebUI at a different model for those tasks.
**Admin Settings → Interface → Task Model** → pick any of the
non-Image-Studio models you have pulled. `mistral-nemo:12b`,
`llama3.1:8b`, `qwen3.6:latest`, or `dolphin3:8b` all work. The Task
Model only handles short background calls (titles, tags, autocomplete,
search-query rewriting) — it doesn't need to be vision-capable or
particularly large. Smaller is faster and cheaper.
Save. New Image Studio chats now get descriptive titles, tag
suggestions return, and autocomplete lights up.
## Vision capability
The shipped preset sets `meta.capabilities.vision: true` so Open WebUI
allows users to attach images to chats with this model. Two paths:
### Default — `huihui_ai/qwen3-vl-abliterated:8b`
The shipped preset uses huihui_ai's abliteration of Qwen 3 VL as
the base — 8B params, vision-capable, native tool calling working,
and won't refuse to dispatch the tool when the user's edit prompt
is NSFW. Preseed via `init-models.sh`.
**Why not the Qwen 3.5 abliterated 9B (huihui_ai/qwen3.5-abliterated:9b)?**
Same maintainer, but the abliteration on Qwen 3.5 mangles the
function-call template, causing the model to either refuse to call
tools or emit malformed `<function=...>` XML that Open WebUI's
parser can't recognise. The Qwen 3 VL fine-tune lineage is
different and doesn't take that damage from abliteration.
**Why not standard `qwen3.5:9b`?** The standard (non-abliterated)
Qwen 3.5 calls tools reliably but its safety training refuses on
many image edit prompts even though the LLM's only job is dispatch
(the actual image content is generated by the SDXL checkpoint, which
the LLM never sees). Abliterated VL gets us both reliable tool
calling AND a cooperative dispatcher.
**Qwen 3.x quirk:** thinking mode is on by default and abliterated
builds ignore the system-prompt `/no_think` directive — the model
emits its tool call inside a thinking block that the parser treats
as final response text instead of a real tool invocation. The
shipped preset sets `enable_thinking: false` in `custom_params`,
which Ollama enforces server-side and the model can't ignore. Don't
remove it.
### Alternatives
If Qwen 3.5 isn't a fit (size, language preferences, abliteration
caveats), other vision-capable Ollama tags worth trying:
- `qwen2.5vl:7b` — smaller, no thinking mode, very reliable tool-caller
- `llama3.2-vision:11b` — Meta's vision variant, ~7 GB
- `minicpm-v:8b` — fast, capable
To swap, change `base_model_id` in `image_studio.json` (or the Base
Model field if you imported manually) and pull the model via
`init-models.sh` or the Open WebUI model UI.
### Non-vision base model
If you'd rather use a text-only LLM (e.g. `mistral-nemo:12b`),
keep `vision: true` in the preset so Open WebUI still permits image
attachments; the image flows through to `edit_image` via
`__messages__` / `__files__` and ComfyUI does the visual work. The
LLM can't see the image, but for explicit edit instructions ("change
the background to a sunset") that doesn't matter.
## Why this works when a generic chat model didn't
- **The system prompt is unambiguous.** No room for the model to
decide "I'll just describe it in text instead."
- **Only one tool is attached.** No competing tools to choose between.
- **Function Calling: Default** is the safer choice for Qwen 3.x
abliterated. Native mode expects the parser to recognise the
model's structured tool-call format, which currently leaks Qwen
3.5's `<function=...><parameter=...>` XML to chat as plain text on
the published Open WebUI / Ollama versions. Default mode uses Open
WebUI's own prompt-injection wrapper that round-trips reliably.
Try Native only after swapping the base model to one known to work
end-to-end (mistral-nemo, qwen2.5vl).
- **Lower temperature.** Tool calling is more reliable with less
sampling randomness.
## Iterating on the system prompt
If users ask for things you didn't anticipate (specific aspect ratios,
multi-image batches, particular checkpoints not in the routing rules),
edit the system prompt above and re-paste into the Workspace → Models
entry. It's the highest-leverage place to tune behaviour without
touching the Tool's Python.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,612 @@
"""
title: Smart Image Studio (Pipe)
author: ai-stack
version: 0.1.2
description: Deterministic image-gen / edit / inpaint pipe — no LLM in the
loop for the routing decision. Registers as a model in the chat-model
dropdown ('Image Studio (Pipe)'). Reads the user's message + attached
image (if any), routes via regex, calls ComfyUI directly, returns the
image. Use when LLM-with-Tool tool-calling is leaking the call as text
(the abliterated Qwen 3.5 / Open WebUI parser interop bug).
required_open_webui_version: 0.5.0
"""
import asyncio
import base64
import inspect
import io
import json
import re
import time
import uuid
from typing import Awaitable, Callable, Literal, Optional
import aiohttp
from pydantic import BaseModel, Field
# Open WebUI runtime imports — same defensive guard as the sibling Tool.
try:
from fastapi import UploadFile
from open_webui.models.chats import Chats
from open_webui.models.files import Files
from open_webui.models.users import Users
from open_webui.routers.files import upload_file_handler
_OPENWEBUI_RUNTIME = True
except ImportError:
_OPENWEBUI_RUNTIME = False
# ─────────────────────────────────────────────────────────────────────────────
# Per-style settings — kept in sync with smart_image_gen.py. If you change
# checkpoint filenames in comfyui-init-models.sh, update both files.
# ─────────────────────────────────────────────────────────────────────────────
STYLES = {
"photo": {
"ckpt": "CyberRealisticXLPlay_V8.0_FP16.safetensors",
"sampler": "dpmpp_2m_sde",
"scheduler": "karras",
"cfg": 4.0, "steps": 28, "clip_skip": 1,
"prefix": "",
"negative": (
"cartoon, drawing, illustration, anime, manga, painting, sketch, "
"render, 3d, cgi, plastic skin, oversaturated, "
"lowres, blurry, jpeg artifacts, low quality, worst quality, "
"bad anatomy, deformed, extra fingers, missing fingers, "
"watermark, signature, text, logo"
),
},
"juggernaut": {
"ckpt": "Juggernaut-XL_v9_RunDiffusionPhoto_v2.safetensors",
"sampler": "dpmpp_2m_sde",
"scheduler": "karras",
"cfg": 4.5, "steps": 35, "clip_skip": 1,
"prefix": "",
"negative": (
"cartoon, drawing, illustration, anime, painting, sketch, render, "
"3d, cgi, plastic skin, washed out, "
"lowres, blurry, jpeg artifacts, low quality, worst quality, "
"bad anatomy, deformed, extra fingers, missing fingers, "
"watermark, signature, text, logo"
),
},
"pony": {
"ckpt": "ponyDiffusionV6XL_v6StartWithThisOne.safetensors",
"sampler": "euler_ancestral",
"scheduler": "normal",
"cfg": 7.5, "steps": 25, "clip_skip": 2,
"prefix": "score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up, ",
"negative": (
"score_6, score_5, score_4, "
"worst quality, low quality, lowres, blurry, jpeg artifacts, "
"bad anatomy, bad hands, extra digit, fewer digits, "
"deformed, ugly, censored, monochrome, "
"watermark, signature, text, artist name"
),
},
"general": {
"ckpt": "talmendoxlSDXL_v11Beta.safetensors",
"sampler": "dpmpp_2m",
"scheduler": "karras",
"cfg": 8.0, "steps": 30, "clip_skip": 2,
"prefix": "",
"negative": (
"lowres, blurry, jpeg artifacts, low quality, worst quality, "
"bad anatomy, deformed, ugly, watermark, signature, text"
),
},
"furry-nai": {
"ckpt": "reedFURRYMixSDXL_v23nai.safetensors",
"sampler": "euler_ancestral",
"scheduler": "normal",
"cfg": 5.0, "steps": 30, "clip_skip": 2,
"prefix": (
"masterpiece, best quality, high quality, detailed eyes, "
"highres, absurdres, furry, "
),
"negative": (
"human, realistic, photorealistic, 3d, cgi, "
"worst quality, low quality, lowres, blurry, jpeg artifacts, "
"bad anatomy, extra digit, fewer digits, deformed, ugly, "
"watermark, signature, text"
),
},
"furry-noob": {
"ckpt": "indigoVoidFurryFusedXL_noobaiV32.safetensors",
"sampler": "euler_ancestral",
"scheduler": "normal",
"cfg": 4.5, "steps": 20, "clip_skip": 2,
"prefix": (
"masterpiece, best quality, perfect quality, absurdres, newest, "
"very aesthetic, vibrant colors, "
),
"negative": (
"human, realistic, photorealistic, 3d, cgi, shiny skin, "
"worst quality, low quality, lowres, blurry, jpeg artifacts, "
"bad anatomy, bad hands, mutated hands, "
"watermark, signature, text"
),
},
"furry-il": {
"ckpt": "novaFurryXL_ilV170.safetensors",
"sampler": "euler_ancestral",
"scheduler": "normal",
"cfg": 4.0, "steps": 30, "clip_skip": 2,
"prefix": (
"masterpiece, best quality, amazing quality, very aesthetic, "
"ultra-detailed, absurdres, newest, furry, anthro, "
),
"negative": (
"human, multiple tails, modern, recent, old, oldest, graphic, "
"cartoon, painting, deformed, mutated, ugly, lowres, "
"bad anatomy, bad hands, missing fingers, extra digits, "
"worst quality, bad quality, sketch, jpeg artifacts, "
"signature, watermark, text, simple background"
),
},
}
DEFAULT_STYLE = "furry-il"
ROUTING_RULES = [
(re.compile(r"\bscore_\d", re.I), "pony"),
(re.compile(r"\bpony\b", re.I), "pony"),
(re.compile(r"\b(noobai|noob)\b", re.I), "furry-noob"),
(re.compile(r"\b(illustrious|ilxl)\b", re.I), "furry-il"),
(re.compile(r"\b(furry|anthro|feral|kemono|fursona|species)\b", re.I), "furry-il"),
(re.compile(r"\b(juggernaut)\b", re.I), "juggernaut"),
(re.compile(r"\b(photo|photograph|realistic|portrait|selfie|cinematic)\b", re.I), "photo"),
(re.compile(r"\b(anime|manga|2d|illustration)\b", re.I), "pony"),
]
# Phrases that imply local-only editing → triggers inpaint mode and
# pulls out a noun phrase as the mask text.
INPAINT_PATTERNS = [
re.compile(r"\b(?:change|recolor|edit|modify|replace|remove|delete|add)\s+(?:the|that|her|his|its)\s+([\w\s'-]{2,30}?)(?:\s+(?:to|into|with|so|that|and|,|\.)|$)", re.I),
re.compile(r"\b(?:make|turn)\s+(?:the|that|her|his|its)\s+([\w\s'-]{2,30}?)\s+(?:bigger|smaller|larger|wider|taller|shorter|longer|brighter|darker|red|blue|green|yellow|orange|purple|pink|black|white|gold)", re.I),
re.compile(r"\b(?:only|just)\s+(?:the|change the|edit the)\s+([\w\s'-]{2,30}?)(?:\s+|$)", re.I),
]
def _route_style(prompt: str) -> str:
for pattern, style in ROUTING_RULES:
if pattern.search(prompt):
return style
return DEFAULT_STYLE
def _detect_mask_text(prompt: str) -> Optional[str]:
"""Pull a noun phrase out of edit-style instructions for inpaint."""
for pattern in INPAINT_PATTERNS:
m = pattern.search(prompt)
if m:
obj = m.group(1).strip().rstrip(",.").strip()
if obj:
return f"the {obj}"
return None
def _inherited_style(messages) -> Optional[str]:
"""Best-effort: read prior assistant message metadata for a style hint."""
if not messages:
return None
for msg in reversed(messages):
if not isinstance(msg, dict):
continue
# Look for a "style: X" comment in the assistant's previous text
if msg.get("role") == "assistant":
content = msg.get("content")
if isinstance(content, str):
m = re.search(r"\bstyle[:=]\s*([\w\-]+)", content)
if m and m.group(1) in STYLES:
return m.group(1)
return None
def _seed_value(seed: int) -> int:
return seed if seed > 0 else int(time.time() * 1000) % (2**31)
def _build_txt2img(positive: str, negative: str, settings: dict,
width: int, height: int, seed: int) -> dict:
return {
"3": {"class_type": "KSampler", "inputs": {
"seed": _seed_value(seed),
"steps": settings["steps"], "cfg": settings["cfg"],
"sampler_name": settings["sampler"], "scheduler": settings["scheduler"],
"denoise": 1.0,
"model": ["4", 0], "positive": ["6", 0],
"negative": ["7", 0], "latent_image": ["5", 0],
}},
"4": {"class_type": "CheckpointLoaderSimple", "inputs": {"ckpt_name": settings["ckpt"]}},
"5": {"class_type": "EmptyLatentImage",
"inputs": {"width": width, "height": height, "batch_size": 1}},
"6": {"class_type": "CLIPTextEncode", "inputs": {"text": positive, "clip": ["10", 0]}},
"7": {"class_type": "CLIPTextEncode", "inputs": {"text": negative, "clip": ["10", 0]}},
"8": {"class_type": "VAEDecode", "inputs": {"samples": ["3", 0], "vae": ["4", 2]}},
"9": {"class_type": "SaveImage",
"inputs": {"filename_prefix": "smartpipe", "images": ["8", 0]}},
"10": {"class_type": "CLIPSetLastLayer",
"inputs": {"stop_at_clip_layer": -settings["clip_skip"], "clip": ["4", 1]}},
}
def _build_img2img(positive: str, negative: str, settings: dict,
image_filename: str, denoise: float, seed: int) -> dict:
return {
"3": {"class_type": "KSampler", "inputs": {
"seed": _seed_value(seed),
"steps": settings["steps"], "cfg": settings["cfg"],
"sampler_name": settings["sampler"], "scheduler": settings["scheduler"],
"denoise": denoise,
"model": ["4", 0], "positive": ["6", 0],
"negative": ["7", 0], "latent_image": ["11", 0],
}},
"4": {"class_type": "CheckpointLoaderSimple", "inputs": {"ckpt_name": settings["ckpt"]}},
"6": {"class_type": "CLIPTextEncode", "inputs": {"text": positive, "clip": ["10", 0]}},
"7": {"class_type": "CLIPTextEncode", "inputs": {"text": negative, "clip": ["10", 0]}},
"8": {"class_type": "VAEDecode", "inputs": {"samples": ["3", 0], "vae": ["4", 2]}},
"9": {"class_type": "SaveImage",
"inputs": {"filename_prefix": "smartpipe", "images": ["8", 0]}},
"10": {"class_type": "CLIPSetLastLayer",
"inputs": {"stop_at_clip_layer": -settings["clip_skip"], "clip": ["4", 1]}},
"11": {"class_type": "VAEEncode", "inputs": {"pixels": ["12", 0], "vae": ["4", 2]}},
"12": {"class_type": "LoadImage", "inputs": {"image": image_filename}},
}
def _build_inpaint(positive: str, negative: str, settings: dict,
image_filename: str, mask_text: str,
denoise: float, seed: int) -> dict:
return {
"3": {"class_type": "KSampler", "inputs": {
"seed": _seed_value(seed),
"steps": settings["steps"], "cfg": settings["cfg"],
"sampler_name": settings["sampler"], "scheduler": settings["scheduler"],
"denoise": denoise,
"model": ["4", 0], "positive": ["6", 0],
"negative": ["7", 0], "latent_image": ["13", 0],
}},
"4": {"class_type": "CheckpointLoaderSimple", "inputs": {"ckpt_name": settings["ckpt"]}},
"6": {"class_type": "CLIPTextEncode", "inputs": {"text": positive, "clip": ["10", 0]}},
"7": {"class_type": "CLIPTextEncode", "inputs": {"text": negative, "clip": ["10", 0]}},
"8": {"class_type": "VAEDecode", "inputs": {"samples": ["3", 0], "vae": ["4", 2]}},
"9": {"class_type": "SaveImage",
"inputs": {"filename_prefix": "smartpipe", "images": ["8", 0]}},
"10": {"class_type": "CLIPSetLastLayer",
"inputs": {"stop_at_clip_layer": -settings["clip_skip"], "clip": ["4", 1]}},
"11": {"class_type": "VAEEncode", "inputs": {"pixels": ["12", 0], "vae": ["4", 2]}},
"12": {"class_type": "LoadImage", "inputs": {"image": image_filename}},
"13": {"class_type": "SetLatentNoiseMask",
"inputs": {"samples": ["11", 0], "mask": ["17", 0]}},
"14": {"class_type": "SAMModelLoader (segment anything)",
"inputs": {"model_name": "sam_hq_vit_h (2.57GB)"}},
"15": {"class_type": "GroundingDinoModelLoader (segment anything)",
"inputs": {"model_name": "GroundingDINO_SwinT_OGC (694MB)"}},
"16": {"class_type": "GroundingDinoSAMSegment (segment anything)",
"inputs": {
"sam_model": ["14", 0], "grounding_dino_model": ["15", 0],
"image": ["12", 0], "prompt": mask_text, "threshold": 0.3,
}},
"17": {"class_type": "GrowMask",
"inputs": {"mask": ["16", 1], "expand": 12, "tapered_corners": True}},
}
_FILE_URL_ID_RE = re.compile(r"/(?:api/v1/)?files/([0-9a-fA-F-]{8,})(?:/content)?")
def _file_dict_is_image(f: dict) -> bool:
ftype = (f.get("type") or "").lower()
fname = (f.get("name") or f.get("filename") or "").lower()
return "image" in ftype or fname.endswith((".png", ".jpg", ".jpeg", ".webp"))
async def _read_file_dict(f: dict) -> Optional[bytes]:
for path_key in ("path", "filepath", "file_path"):
path = f.get(path_key)
if path:
try:
with open(path, "rb") as fh:
return fh.read()
except OSError:
pass
candidate_ids = []
if f.get("id"):
candidate_ids.append(f["id"])
url = f.get("url")
if url:
m = _FILE_URL_ID_RE.search(url)
if m:
candidate_ids.append(m.group(1))
if _OPENWEBUI_RUNTIME:
for fid in candidate_ids:
try:
file_model = await Files.get_file_by_id(fid)
if file_model is None:
continue
path = getattr(file_model, "path", None)
if not path:
meta = getattr(file_model, "meta", None) or {}
path = meta.get("path") if isinstance(meta, dict) else getattr(meta, "path", None)
if path:
try:
with open(path, "rb") as fh:
return fh.read()
except OSError:
pass
except Exception:
pass
return None
async def _extract_attached_image(files, messages, metadata, session) -> Optional[bytes]:
# 1. Inline data URIs
for msg in reversed(messages or []):
content = msg.get("content") if isinstance(msg, dict) else None
if isinstance(content, list):
for block in content:
if not isinstance(block, dict) or block.get("type") != "image_url":
continue
url = (block.get("image_url") or {}).get("url", "")
if url.startswith("data:image"):
try:
return base64.b64decode(url.split(",", 1)[1])
except Exception:
pass
# 2. messages[].files
for msg in reversed(messages or []):
if not isinstance(msg, dict):
continue
for f in (msg.get("files") or []):
if isinstance(f, dict) and _file_dict_is_image(f):
data = await _read_file_dict(f)
if data is not None:
return data
# 3. __files__
for f in files or []:
if isinstance(f, dict) and _file_dict_is_image(f):
data = await _read_file_dict(f)
if data is not None:
return data
# 4. DB lookup (assistant-emitted files often only land here)
if _OPENWEBUI_RUNTIME and metadata:
chat_id = metadata.get("chat_id")
if chat_id:
try:
chat = await Chats.get_chat_by_id(chat_id)
chat_data = getattr(chat, "chat", None) if chat else None
chat_messages = (chat_data or {}).get("messages", []) if isinstance(chat_data, dict) else []
for msg in reversed(chat_messages):
for f in (msg.get("files") or []) if isinstance(msg, dict) else []:
if isinstance(f, dict) and _file_dict_is_image(f):
data = await _read_file_dict(f)
if data is not None:
return data
except Exception:
pass
return None
async def _upload_to_comfyui(session, base, raw) -> Optional[str]:
name = f"smartpipe_{uuid.uuid4().hex[:12]}.png"
form = aiohttp.FormData()
form.add_field("image", raw, filename=name, content_type="image/png")
form.add_field("overwrite", "true")
async with session.post(f"{base}/upload/image", data=form) as resp:
if resp.status != 200:
return None
return (await resp.json()).get("name", name)
async def _push_image_to_chat(raw, prefix, request, user_dict, metadata, event_emitter) -> bool:
if not (_OPENWEBUI_RUNTIME and request and user_dict and event_emitter):
return False
try:
user = await Users.get_user_by_id(user_dict.get("id"))
if not user:
return False
upload = UploadFile(
file=io.BytesIO(raw),
filename=f"{prefix}_{uuid.uuid4().hex[:8]}.png",
headers={"content-type": "image/png"},
)
result = upload_file_handler(
request=request, file=upload,
metadata={"chat_id": (metadata or {}).get("chat_id"),
"message_id": (metadata or {}).get("message_id")},
process=False, user=user,
)
file_item = await result if inspect.iscoroutine(result) else result
url = request.app.url_path_for("get_file_content_by_id", id=file_item.id)
await event_emitter({
"type": "files",
"data": {"files": [{"type": "image", "url": url}]},
})
return True
except Exception:
return False
async def _submit_and_fetch(session, base, workflow, timeout_seconds, emit, settings):
SAVE_NODE_ID = "9"
client_id = str(uuid.uuid4())
async with session.post(
f"{base}/prompt", json={"prompt": workflow, "client_id": client_id}
) as resp:
if resp.status != 200:
return None, f"ComfyUI rejected the prompt: {resp.status} {await resp.text()}"
prompt_id = (await resp.json()).get("prompt_id")
if not prompt_id:
return None, "ComfyUI didn't return a prompt_id."
await emit(
f"Sampling — {settings['sampler']}/{settings['scheduler']}, "
f"CFG {settings['cfg']}, {settings['steps']} steps"
)
deadline = time.time() + timeout_seconds
output_images: list = []
while time.time() < deadline:
await asyncio.sleep(1.5)
async with session.get(f"{base}/history/{prompt_id}") as resp:
if resp.status != 200:
continue
history = await resp.json()
if prompt_id in history:
outputs = history[prompt_id].get("outputs", {}) or {}
save_imgs = (outputs.get(SAVE_NODE_ID) or {}).get("images", [])
if save_imgs:
output_images.extend(save_imgs)
if not output_images:
for node_out in outputs.values():
output_images.extend(node_out.get("images", []))
if output_images:
break
if not output_images:
return None, f"Timed out after {timeout_seconds}s waiting for image."
img = output_images[0]
params = {
"filename": img["filename"],
"subfolder": img.get("subfolder", ""),
"type": img.get("type", "output"),
}
async with session.get(f"{base}/view", params=params) as resp:
if resp.status != 200:
return None, f"Failed to fetch image: {resp.status}"
return await resp.read(), None
def _extract_user_text(body: dict) -> str:
"""Pull the latest user message's text content."""
messages = body.get("messages", [])
for msg in reversed(messages):
if not isinstance(msg, dict) or msg.get("role") != "user":
continue
content = msg.get("content")
if isinstance(content, str):
return content.strip()
if isinstance(content, list):
parts = []
for block in content:
if isinstance(block, dict) and block.get("type") == "text":
parts.append(block.get("text", ""))
return " ".join(parts).strip()
return ""
class Pipe:
class Valves(BaseModel):
COMFYUI_BASE_URL: str = Field(
default="http://comfyui:8188",
description="ComfyUI server URL reachable from the open-webui container.",
)
TIMEOUT_SECONDS: int = Field(default=600)
DEFAULT_WIDTH: int = Field(default=1024)
DEFAULT_HEIGHT: int = Field(default=1024)
DEFAULT_DENOISE_IMG2IMG: float = Field(default=0.7)
DEFAULT_DENOISE_INPAINT: float = Field(default=1.0)
FORCE_STYLE: str = Field(
default="",
description="Override style routing. Empty = auto-route. Set to "
"one of: photo, juggernaut, pony, general, "
"furry-nai, furry-noob, furry-il.",
)
def __init__(self):
self.valves = self.Valves()
self.id = "image-studio-pipe"
self.name = "Image Studio (Pipe)"
async def pipe(
self,
body: dict,
__user__: Optional[dict] = None,
__request__=None,
__metadata__: Optional[dict] = None,
__event_emitter__: Optional[Callable[[dict], Awaitable[None]]] = None,
) -> str:
user_text = _extract_user_text(body)
if not user_text:
return "Type a message describing the image you want."
async def emit(msg: str, done: bool = False):
if __event_emitter__:
await __event_emitter__({
"type": "status",
"data": {"description": msg, "done": done},
})
# Style: explicit valve override > inherited from prior assistant
# message > keyword detection on user text > default.
chosen = (
self.valves.FORCE_STYLE.strip()
or _inherited_style(body.get("messages"))
or _route_style(user_text)
)
if chosen not in STYLES:
chosen = DEFAULT_STYLE
settings = STYLES[chosen]
base = self.valves.COMFYUI_BASE_URL.rstrip("/")
positive = f"{settings['prefix']}{user_text}"
negative = settings["negative"]
async with aiohttp.ClientSession() as session:
await emit("Looking for attached image…")
source_bytes = await _extract_attached_image(
None, body.get("messages"), __metadata__, session,
)
if source_bytes is None:
# No image → txt2img
await emit(f"Generating ({chosen})")
workflow = _build_txt2img(
positive, negative, settings,
self.valves.DEFAULT_WIDTH, self.valves.DEFAULT_HEIGHT, 0,
)
tag = "gen"
else:
# Image present → upload, then inpaint or img2img
uploaded = await _upload_to_comfyui(session, base, source_bytes)
if not uploaded:
return "Failed to upload source image to ComfyUI."
mask_text = _detect_mask_text(user_text)
if mask_text:
await emit(
f"Inpainting ({chosen}, mask='{mask_text}', "
f"denoise={self.valves.DEFAULT_DENOISE_INPAINT})"
)
workflow = _build_inpaint(
positive, negative, settings, uploaded, mask_text,
self.valves.DEFAULT_DENOISE_INPAINT, 0,
)
tag = f"edit (inpaint: {mask_text})"
else:
await emit(
f"Editing ({chosen}, "
f"denoise={self.valves.DEFAULT_DENOISE_IMG2IMG})"
)
workflow = _build_img2img(
positive, negative, settings, uploaded,
self.valves.DEFAULT_DENOISE_IMG2IMG, 0,
)
tag = "edit (img2img)"
raw, err = await _submit_and_fetch(
session, base, workflow, self.valves.TIMEOUT_SECONDS, emit, settings,
)
if err:
return err
await _push_image_to_chat(
raw, "smartpipe", __request__, __user__, __metadata__, __event_emitter__,
)
await emit(f"Done — {chosen}", done=True)
# Single-line plain-English follow-up. Emit the style as
# "style: <name>" so the inheritance helper can find it next turn.
return f"Done — style: {chosen}, {tag}."

View File

@@ -0,0 +1,28 @@
#!/bin/sh
# Entrypoint wrapper. Pip-installs requirements.txt for any custom_node
# present in /opt/comfyui/custom_nodes/, then exec's the CMD.
#
# This makes the container self-healing for custom nodes that get added
# at runtime — either via ComfyUI-Manager from the web UI, or by
# git-cloning directly into the comfyui-custom-nodes volume. Pip skips
# already-satisfied requirements quickly, so the boot-time cost on
# subsequent restarts is negligible.
set -e
if [ -d /opt/comfyui/custom_nodes ]; then
for req in /opt/comfyui/custom_nodes/*/requirements.txt; do
[ -f "$req" ] || continue
echo "[entrypoint] installing $req"
pip install -q -r "$req" || echo " (install failed — continuing)"
done
fi
# Force-pin known-incompatible packages back into a working range. Some
# custom nodes bring transformers >=5 transitively, which removes
# BertModel.get_head_mask and breaks comfyui_segment_anything's
# GroundingDINO. Run last so it wins over anything the loop above
# installed.
pip install -q "transformers>=4.40,<5" || echo "[entrypoint] transformers pin failed — continuing"
exec "$@"

View File

@@ -3,10 +3,10 @@
"class_type": "KSampler",
"inputs": {
"seed": 0,
"steps": 20,
"cfg": 7,
"sampler_name": "euler",
"scheduler": "normal",
"steps": 28,
"cfg": 4.0,
"sampler_name": "dpmpp_2m_sde",
"scheduler": "karras",
"denoise": 0.75,
"model": ["4", 0],
"positive": ["6", 0],
@@ -17,7 +17,7 @@
"4": {
"class_type": "CheckpointLoaderSimple",
"inputs": {
"ckpt_name": "v1-5-pruned-emaonly.safetensors"
"ckpt_name": "CyberRealisticXLPlay_V8.0_FP16.safetensors"
}
},
"6": {
@@ -30,7 +30,7 @@
"7": {
"class_type": "CLIPTextEncode",
"inputs": {
"text": "",
"text": "lowres, blurry, jpeg artifacts, watermark, text, signature, bad anatomy, extra limbs, missing fingers, deformed, ugly, low quality, worst quality",
"clip": ["4", 1]
}
},

View File

@@ -3,10 +3,10 @@
"class_type": "KSampler",
"inputs": {
"seed": 0,
"steps": 20,
"cfg": 7,
"sampler_name": "euler",
"scheduler": "normal",
"steps": 28,
"cfg": 4.0,
"sampler_name": "dpmpp_2m_sde",
"scheduler": "karras",
"denoise": 1,
"model": ["4", 0],
"positive": ["6", 0],
@@ -17,7 +17,7 @@
"4": {
"class_type": "CheckpointLoaderSimple",
"inputs": {
"ckpt_name": "v1-5-pruned-emaonly.safetensors"
"ckpt_name": "CyberRealisticXLPlay_V8.0_FP16.safetensors"
}
},
"5": {
@@ -38,7 +38,7 @@
"7": {
"class_type": "CLIPTextEncode",
"inputs": {
"text": "",
"text": "lowres, blurry, jpeg artifacts, watermark, text, signature, bad anatomy, extra limbs, missing fingers, deformed, ugly, low quality, worst quality",
"clip": ["4", 1]
}
},