comfyui-nvidia

Author	SHA1	Message	Date
William Gill	e77666ea0f	Image Studio docs: require setting a separate Task Model after install tool_choice: required (the thing that makes Image Studio reliably fire its tools) also blocks Open WebUI's background text-only calls — title generation, tag suggestions, autocomplete — because the model is forced to produce a tool call instead of text. Result: chats stay named 'New Chat' and tag suggestions go silent. Documented the fix in two places: - image_studio.md: dedicated 'Set a separate Task Model (required after install)' section explaining the cause and the fix path. - deployment README §9: short follow-up note pointing at it so operators don't miss it during initial setup. The fix is purely Open WebUI configuration — no code change. Pick any non-Image-Studio model already pulled (mistral-nemo:12b is the obvious default) for the Task Model slot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 15:37:21 -05:00
William Gill	d935e24624	Add text-targeted inpainting via GroundingDINO+SAM (mask_text param) All checks were successful release / Build & Push Docker Image (push) Successful in 44s Details Five pieces: 1. Dockerfile installs storyicon/comfyui_segment_anything (GroundingDINO + SAM-HQ in one bundle) into custom_nodes and pip-installs its requirements at build time. Model weights auto-download to the comfyui-models volume on first inpaint (~3 GB one-time cost). 2. install-custom-node-deps.sh — entrypoint wrapper that pip-installs requirements.txt for any custom_node present at startup. Lets users add custom nodes via ComfyUI-Manager (or by git-cloning into the volume) and have the deps picked up on the next restart, without editing the Dockerfile. 3. smart_image_gen v0.6: edit_image gains a `mask_text` param. When set, builds an inpainting workflow (LoadImage → GroundingDinoSAM Segment → SetLatentNoiseMask → KSampler) so only the named region is repainted. When unset, falls through to the existing img2img path. Denoise default switches: 1.0 with mask_text (full repaint within mask), 0.7 without. 4. Image Studio system prompt teaches the LLM the LOCAL vs GLOBAL distinction — set mask_text whenever the user names a specific object/region ('the ball', 'the dog', 'the sky'); leave it unset only for whole-image style/lighting transformations. 5. Deployment README documents the new mode + the first-inpaint weight-download caveat. Image rebuild required — bump tag to pick up the Dockerfile change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:43:52 -05:00
William Gill	5a34ced8f1	Add S3 mirror path for Ollama models + mirror-ollama-model.sh helper Three pieces: 1. mirror-ollama-model.sh — run on any machine that has the model pulled. Parses the manifest at ~/.ollama/models/manifests/registry.ollama.ai/<ns>/<name>/<tag>, greps every sha256:* digest, tars manifest + referenced blobs into one .tgz. Output is portable — extract over any other Ollama data dir and the model is immediately visible. 2. init-models.sh gains an s3_pull function that curls a tarball from $S3_OLLAMA_BASE and extracts into /root/.ollama/models/. Falls back to ollama pull when S3_OLLAMA_BASE is unset, so s3_pull lines are safe to commit before the bucket is ready. huihui_ai/qwen3.5- abliterated:9b promoted to s3_pull as the example. 3. docker-compose.yml model-init service propagates S3_OLLAMA_BASE from .env. Curl auto-installs at script start because ollama/ollama doesn't always ship it. README documents the mirror workflow under "Mirroring models to S3". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 13:43:26 -05:00
William Gill	f77f5993fb	Image Studio: enable vision capability + document upgrade path Open WebUI was blocking image attachments to the Image Studio model because mistral-nemo:12b isn't vision-capable. Two changes: - capabilities.vision flipped to true in the preset JSON. The Tool only needs the image to make it through __messages__ / __files__ to call edit_image; the actual visual processing happens in ComfyUI's img2img, not in the LLM. Setting the flag unlocks the attach-image UI without lying about what mistral-nemo can do. - System prompt now tells the LLM explicitly: "you may not be able to visually inspect the attached image — that is fine. Trust the user's description and call edit_image." Prevents the LLM from refusing or hedging when it gets an image it can't see. Documented the upgrade path in image_studio.md for users who want real vision (qwen2.5vl:7b, llama3.2-vision:11b, minicpm-v:8b — pick one, add to init-models.sh, swap base_model_id in the preset). The vision LLM can then write smarter edit_image calls from the image content rather than the user's description alone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 13:31:17 -05:00
William Gill	6adf133558	Ship Image Studio as importable JSON in addition to markdown walkthrough Open WebUI accepts a JSON file at Workspace → Models → Import that seeds a new model preset in one click instead of the manual table- driven setup. The new image_studio.json mirrors the Open WebUI bulk- export schema (array wrapper around the model object with id, name, base_model_id, params, meta) and pre-fills system prompt, native function calling, temperature 0.5, top_p 0.9, smart_image_gen tool attachment, suggestion prompts. The markdown walkthrough stays as the source of truth for the system prompt content and as the fallback when import fails (e.g. tool ID mismatch, unfamiliar field, schema drift across Open WebUI versions). README points at both paths. Caveat doc'd in the markdown: if the imported preset doesn't actually have smart_image_gen attached, the tool ID in the JSON didn't match what Open WebUI assigned — re-attach manually in the model edit screen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 13:04:49 -05:00
William Gill	d4e2058859	smart_image_gen v0.3: add edit_image (img2img) method The Tool now exposes two methods the LLM picks between based on whether the user attached an image: generate_image — txt2img (existing, unchanged behavior) edit_image — img2img on the most recently attached image edit_image extracts the source image from __messages__ (base64 data URIs in image_url content blocks) or __files__ (local path or URL), uploads to ComfyUI's /upload/image, runs an img2img workflow at the caller-specified denoise (default 0.7), and returns the edited result. Same per-style routing / sampler / CFG / prefix logic as generation. Refactored the submit-and-poll loop into _submit_and_fetch shared by both methods. Image extraction is defensive — tries messages first, then files (path then URL), returns a clear "no image attached" message rather than silently generating from scratch. Image Studio system prompt rewritten to teach the LLM when to call edit_image vs generate_image and how to pick denoise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:59:13 -05:00
William Gill	41d571d8d1	Add Image Studio model preset — forces smart_image_gen tool use A documented Open WebUI custom-model preset wrapping mistral-nemo:12b with: aggressive system prompt that mandates calling generate_image, only the smart_image_gen tool attached, native function calling, lower temperature for tool-call reliability. Users pick "Image Studio" from the chat-model dropdown when they want images. Solves the common case where general-purpose chat models describe an image in text instead of firing the tool — usually on conversational phrasings like "can you draw me…". The preset removes the ambiguity by giving the LLM exactly one job and one tool. Setup walkthrough in openwebui-models/image_studio.md; deployment README §9 points users at it as the recommended path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:54:13 -05:00
William Gill	b815cd6a5f	Tune static workflows to CyberRealisticXL recommended settings The static workflow JSONs default to CyberRealisticXLPlay (set in an earlier commit), but the KSampler still had euler/normal/CFG7/20 — the generic settings I scaffolded with. Updated to the creator-published defaults: dpmpp_2m_sde / karras / CFG 4 / 28 steps. CLIP skip 1 already correct (no node needed; default behavior). Added a section to the deployment README spelling out the trade-off: static workflows are locked to one checkpoint family at a time because Open WebUI's nodes mapping doesn't expose sampler/CFG/scheduler/CLIP skip/prefix. For multi-checkpoint use, the smart_image_gen Tool path is the only one that gets these right per-prompt. Re-paste workflows into Open WebUI Settings → Images to pick up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:47:46 -05:00
William Gill	392b26167f	Add smart_image_gen Tool for per-prompt checkpoint routing Open WebUI Tool the LLM invokes instead of the built-in image action. Auto-routes among the seven SDXL checkpoints (photo / juggernaut / pony / general / furry-{nai,noob,il}) based on either an explicit `style` arg or first-match-wins regex over the prompt. Constructs the ComfyUI workflow inline, submits via /prompt, polls /history, returns the result as a base64 data-URI markdown image so no extra hosting is needed. Per-style default negatives. ComfyUI URL / steps / CFG / timeout are admin-tunable Valves. Filters can't see image-gen requests in Open WebUI (the routers skip the filter chain), so the LLM-driven Tool is the only path that gives intent-aware routing without changing the chat UX. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:17:02 -05:00
William Gill	704bcfdf13	Default workflows to SDXL CyberRealistic; ship empty model preseed Drops the SD 1.5 placeholder. The shipped txt2img/img2img workflows now reference CyberRealisticXLPlay_V8.0_FP16.safetensors (the checkpoint figment used in production), and comfyui-init-models.sh ships with no active fetches — operators uncomment examples or add their own URLs. The script + workflow filenames have to line up; README explains. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:02:26 -05:00
William Gill	0ad99b6199	Add comfyui-model-init sidecar for ComfyUI model preseeding Mirrors the Ollama model-init pattern: a one-shot Alpine container that mounts the comfyui-models volume and runs comfyui-init-models.sh, which curls direct download URLs (HuggingFace by default) into the right subdirectories. Idempotent — already-present files are skipped. HF_TOKEN is plumbed through for gated repos (Flux-dev, SD3, etc.) and is opt-in via .env. The default list ships SD 1.5 only, matching the placeholder filename in workflows/*.json. Examples for SDXL, Flux, and upscalers are commented in the script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 11:57:24 -05:00
William Gill	21e976e275	Externalise WEBUI_URL / LLM_URL to .env All checks were successful release / Build & Push Docker Image (push) Successful in 31m43s Details So changing the deployment's hostnames is a one-file edit (.env) instead of touching docker-compose.yml. WEBUI_URL is the full URL with scheme (Open WebUI uses it for auth redirects); LLM_URL is the bare hostname (Anubis wants it for COOKIE_DOMAIN). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 10:49:29 -05:00
William Gill	97547c783c	Make ai-stack the only deployment shape Drops the duplicate standalone compose / .env.example / SETUP.md at the repo root. Bring-up content folded into deployments/ai-stack/README.md so there's exactly one set of deployment instructions, sitting next to the files it describes. Root README is now just the repo overview and a pointer at the deployment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 10:45:23 -05:00
William Gill	5b61caa197	Add deployments/ai-stack — combined production-shape example Sanitized snapshot of the live srvno.de stack: Caddy + Ollama (with preseed) + ComfyUI + Open WebUI + Anubis stub. Real hostnames, secrets, and bcrypt hash replaced with placeholders so the dir is safe to commit. Caddyfile updated to point at comfyui:8188 (the source file pointed at the now-removed forge service). Dropped FIGMENT_/FORGE_/SEGMENT_IMAGE_TAG from the env example. Harmonised the init-models.sh mount path between ollama and model-init services. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 10:40:41 -05:00

14 Commits