William Gill f77f5993fb Image Studio: enable vision capability + document upgrade path
Open WebUI was blocking image attachments to the Image Studio model
because mistral-nemo:12b isn't vision-capable. Two changes:

  - capabilities.vision flipped to true in the preset JSON. The Tool
    only needs the image to make it through __messages__ / __files__
    to call edit_image; the actual visual processing happens in
    ComfyUI's img2img, not in the LLM. Setting the flag unlocks the
    attach-image UI without lying about what mistral-nemo can do.

  - System prompt now tells the LLM explicitly: "you may not be able
    to visually inspect the attached image — that is fine. Trust the
    user's description and call edit_image." Prevents the LLM from
    refusing or hedging when it gets an image it can't see.

Documented the upgrade path in image_studio.md for users who want
real vision (qwen2.5vl:7b, llama3.2-vision:11b, minicpm-v:8b — pick
one, add to init-models.sh, swap base_model_id in the preset). The
vision LLM can then write smarter edit_image calls from the image
content rather than the user's description alone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 13:31:17 -05:00

comfyui-nvidia

ComfyUI image-generation backend, NVIDIA-accelerated, fronted by Open WebUI for multi-user chat and image generation/editing.

Built from the official ComfyUI manual install for NVIDIA — no third-party base image. CI publishes the image to git.anomalous.dev/alphacentri/comfyui-nvidia on every v* tag (see .gitea/workflows/release.yml).

Repository layout

Path What
Dockerfile ComfyUI on NVIDIA, manual-install pattern
workflows/ txt2img + img2img workflow JSONs and node mappings
deployments/ai-stack/ The deployment — compose, Caddyfile, env, model preseed
.gitea/workflows/ Release pipeline (build & push image on tag)

Deploy

The full stack — Caddy + Ollama + ComfyUI + Open WebUI (+ optional Anubis) — lives under deployments/ai-stack/. Bring-up steps, host prerequisites, Open WebUI workflow wiring, and gotchas are in deployments/ai-stack/README.md.

Replaces

This repo supersedes the previous figment + segment + Forge stack. ComfyUI's node graph covers everything those services provided (txt2img, img2img, inpaint, mask generation via SAM/GroundingDINO custom nodes), and Open WebUI talks to it natively.

Description
No description provided
Readme 553 KiB
Languages
Python 84%
Shell 11.7%
Dockerfile 4.3%