Compare commits
5 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 2cecf77981 | |||
| f26dfbee02 | |||
| 06433d3815 | |||
| 780ce42711 | |||
| f6f5690fcd |
@@ -57,9 +57,15 @@ RUN git clone --depth 1 https://github.com/ltdrdata/ComfyUI-Manager.git \
|
||||
# mask_text parameter). Model weights auto-download on first use into
|
||||
# /opt/comfyui/models/{sams,grounding-dino}/ — first inpaint takes ~3 GB of
|
||||
# downloads, subsequent runs are instant.
|
||||
#
|
||||
# Transformers must stay <5: GroundingDINO inside this node calls
|
||||
# BertModel.get_head_mask, which transformers 5.0 silently removed. The pin
|
||||
# is applied AFTER the requirements install so it overrides anything the
|
||||
# upstream requirements.txt would have pulled.
|
||||
RUN git clone --depth 1 https://github.com/storyicon/comfyui_segment_anything.git \
|
||||
${COMFYUI_HOME}/custom_nodes/comfyui_segment_anything && \
|
||||
pip install -q -r ${COMFYUI_HOME}/custom_nodes/comfyui_segment_anything/requirements.txt
|
||||
pip install -q -r ${COMFYUI_HOME}/custom_nodes/comfyui_segment_anything/requirements.txt && \
|
||||
pip install -q "transformers>=4.40,<5"
|
||||
|
||||
# Entrypoint wrapper — auto-installs requirements.txt for any custom_node
|
||||
# present at startup (covers Manager-installed nodes and nodes cloned
|
||||
|
||||
@@ -4,11 +4,13 @@
|
||||
"base_model_id": "huihui_ai/qwen3.5-abliterated:9b",
|
||||
"name": "Image Studio",
|
||||
"params": {
|
||||
"system": "/no_think\n\nYou are an image-tool dispatcher. You do not respond in prose. Every user message MUST result in exactly one tool call.\n\nROUTING:\n- If the user attached an image → call edit_image\n- Otherwise → call generate_image\n\nFire the tool on the FIRST message, with no preamble. Do not write a 'plan', 'approach', 'steps', 'breakdown', or any explanation before calling. Do not ask clarifying questions. Do not say what you are about to do. If the request is vague, pick reasonable defaults and call the tool — the user iterates after.\n\nSTYLES (pick one):\n photo photorealistic photo / portrait / cinematic\n juggernaut alternate photoreal — sharper, more saturated\n pony anime, cartoon, manga, stylised illustration\n general catch-all when nothing else fits\n furry-nai anthropomorphic, NAI-trained mix\n furry-noob anthropomorphic, NoobAI base\n furry-il anthropomorphic, Illustrious base (default for any furry/anthro request)\n\nedit_image has TWO MODES — pick based on whether the change is local or global:\n- LOCAL change (\"change the ball to a basketball\", \"add a hat to the dog\", \"remove the bird\", \"recolor the car red\") → set `mask_text` to a brief noun phrase naming the region (\"the ball\", \"the dog\", \"the bird\", \"the car\"). Only that region is repainted; rest stays pixel-perfect.\n- GLOBAL change (\"make this a sunset\", \"turn this into anime\", \"restyle as oil painting\") → leave mask_text unset. The whole image is reimagined.\nALWAYS prefer LOCAL when the user names a specific object, person, or region. GLOBAL is only for whole-image style/lighting transformations.\n\nDenoise:\n- LOCAL (mask_text set): default 1.0. Drop to 0.6–0.8 only for subtle local edits that should retain some original structure.\n- GLOBAL (no mask_text): default 0.7. Use 0.3–0.5 for subtle restyle, 0.85–1.0 for radical reimagining.\n\nPick style for the DESIRED OUTPUT, not the input image.\n\nWrite rich, descriptive prompts (subject, action, environment, lighting, mood, framing). Do NOT add quality tags like 'masterpiece', 'best quality', 'score_9', 'absurdres' — the tool prepends the correct tags per style. Do NOT set sampler, CFG, steps, scheduler — the tool picks them.\n\nAFTER the tool returns, write at most one short sentence noting your style/mode choice and offering one iteration idea. The image is already shown to the user; do not describe it.",
|
||||
"system": "/no_think\n\nYou are an image-tool dispatcher. You do not respond in prose. Every user message MUST result in exactly one tool call.\n\nROUTING:\n- If the user attached an image (including images you previously generated in this chat) → call edit_image(prompt=..., ...)\n- Otherwise → call generate_image(prompt=..., ...)\nBoth tools take `prompt` as the first argument — same name on both. Do NOT invent `edit_instruction`.\n\nFire the tool on the FIRST message, with no preamble. Do not write a 'plan', 'approach', 'steps', 'breakdown', or any explanation before calling. Do not ask clarifying questions. Do not say what you are about to do. If the request is vague, pick reasonable defaults and call the tool — the user iterates after.\n\nSTYLES (pick one):\n photo photorealistic photo / portrait / cinematic\n juggernaut alternate photoreal — sharper, more saturated\n pony anime, cartoon, manga, stylised illustration\n general catch-all when nothing else fits\n furry-nai anthropomorphic, NAI-trained mix\n furry-noob anthropomorphic, NoobAI base\n furry-il anthropomorphic, Illustrious base (default for any furry/anthro request)\n\nedit_image has TWO MODES — pick based on whether the change is local or global:\n- LOCAL change (\"change the ball to a basketball\", \"add a hat to the dog\", \"remove the bird\", \"recolor the car red\") → set `mask_text` to a brief noun phrase naming the region (\"the ball\", \"the dog\", \"the bird\", \"the car\"). Only that region is repainted; rest stays pixel-perfect.\n- GLOBAL change (\"make this a sunset\", \"turn this into anime\", \"restyle as oil painting\") → leave mask_text unset. The whole image is reimagined.\nALWAYS prefer LOCAL when the user names a specific object, person, or region. GLOBAL is only for whole-image style/lighting transformations.\n\nDenoise:\n- LOCAL (mask_text set): default 1.0. Drop to 0.6–0.8 only for subtle local edits that should retain some original structure.\n- GLOBAL (no mask_text): default 0.7. Use 0.3–0.5 for subtle restyle, 0.85–1.0 for radical reimagining.\n\nPick style for the DESIRED OUTPUT, not the input image.\n\nWrite rich, descriptive prompts (subject, action, environment, lighting, mood, framing). Do NOT add quality tags like 'masterpiece', 'best quality', 'score_9', 'absurdres' — the tool prepends the correct tags per style. Do NOT set sampler, CFG, steps, scheduler — the tool picks them.\n\nAFTER the tool returns, write at most one short sentence noting your style/mode choice and offering one iteration idea. The image is already shown to the user; do not describe it.",
|
||||
"temperature": 0.5,
|
||||
"top_p": 0.9,
|
||||
"function_calling": "native",
|
||||
"tool_choice": "required"
|
||||
"custom_params": {
|
||||
"tool_choice": "required"
|
||||
}
|
||||
},
|
||||
"meta": {
|
||||
"profile_image_url": "/static/favicon.png",
|
||||
|
||||
@@ -65,8 +65,11 @@ You are an image-tool dispatcher. You do not respond in prose. Every
|
||||
user message MUST result in exactly one tool call.
|
||||
|
||||
ROUTING:
|
||||
- If the user attached an image → call edit_image
|
||||
- Otherwise → call generate_image
|
||||
- If the user attached an image (including images you previously
|
||||
generated in this chat) → call edit_image(prompt=..., ...)
|
||||
- Otherwise → call generate_image(prompt=..., ...)
|
||||
Both tools take `prompt` as the first argument — same name on both.
|
||||
Do NOT invent `edit_instruction`.
|
||||
|
||||
Fire the tool on the FIRST message, with no preamble. Do not write a
|
||||
'plan', 'approach', 'steps', 'breakdown', or any explanation before
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
"""
|
||||
title: Smart Image Generator & Editor (ComfyUI)
|
||||
author: ai-stack
|
||||
version: 0.6.0
|
||||
version: 0.7.2
|
||||
description: Generate or edit images via ComfyUI with automatic SDXL
|
||||
checkpoint routing. Two methods — generate_image (txt2img) and
|
||||
edit_image (img2img on the user's most recently attached image). The
|
||||
@@ -34,6 +34,8 @@ from pydantic import BaseModel, Field
|
||||
# falls back to emitting a markdown data-URI message.
|
||||
try:
|
||||
from fastapi import UploadFile
|
||||
from open_webui.models.chats import Chats
|
||||
from open_webui.models.files import Files
|
||||
from open_webui.models.users import Users
|
||||
from open_webui.routers.files import upload_file_handler
|
||||
|
||||
@@ -338,20 +340,93 @@ def _build_img2img(positive: str, negative: str, settings: dict,
|
||||
}
|
||||
|
||||
|
||||
def _file_dict_is_image(f: dict) -> bool:
|
||||
ftype = (f.get("type") or "").lower()
|
||||
fname = (f.get("name") or f.get("filename") or "").lower()
|
||||
return "image" in ftype or fname.endswith((".png", ".jpg", ".jpeg", ".webp"))
|
||||
|
||||
|
||||
_FILE_URL_ID_RE = re.compile(r"/(?:api/v1/)?files/([0-9a-fA-F-]{8,})(?:/content)?")
|
||||
|
||||
|
||||
def _read_file_dict(f: dict) -> Optional[bytes]:
|
||||
"""
|
||||
Try to read raw bytes for one file dict. Tries in order:
|
||||
1. Local filesystem path keys (covers user uploads with `path`).
|
||||
2. Open WebUI's Files.get_file_by_id with f["id"] (covers files
|
||||
the user uploaded via the file API).
|
||||
3. Same lookup with the id parsed out of f["url"] (covers
|
||||
assistant-emitted files where the message attachment is just
|
||||
{"type":"image","url":"/api/v1/files/<uuid>/content"} —
|
||||
no id field, no path field, but the URL has the id).
|
||||
"""
|
||||
for path_key in ("path", "filepath", "file_path"):
|
||||
path = f.get(path_key)
|
||||
if path:
|
||||
try:
|
||||
with open(path, "rb") as fh:
|
||||
return fh.read()
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
candidate_ids = []
|
||||
if f.get("id"):
|
||||
candidate_ids.append(f["id"])
|
||||
url = f.get("url")
|
||||
if url:
|
||||
m = _FILE_URL_ID_RE.search(url)
|
||||
if m:
|
||||
candidate_ids.append(m.group(1))
|
||||
|
||||
if _OPENWEBUI_RUNTIME:
|
||||
for fid in candidate_ids:
|
||||
try:
|
||||
file_model = Files.get_file_by_id(fid)
|
||||
if file_model is None:
|
||||
continue
|
||||
path = getattr(file_model, "path", None)
|
||||
if not path:
|
||||
meta = getattr(file_model, "meta", None) or {}
|
||||
if isinstance(meta, dict):
|
||||
path = meta.get("path")
|
||||
else:
|
||||
path = getattr(meta, "path", None)
|
||||
if path:
|
||||
try:
|
||||
with open(path, "rb") as fh:
|
||||
return fh.read()
|
||||
except OSError:
|
||||
pass
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return None
|
||||
|
||||
|
||||
async def _extract_attached_image(
|
||||
files: Optional[list],
|
||||
messages: Optional[list],
|
||||
metadata: Optional[dict],
|
||||
session: aiohttp.ClientSession,
|
||||
) -> Optional[bytes]:
|
||||
"""
|
||||
Find the most recent image the user attached to the chat. Tries three
|
||||
sources in order: (1) base64 data URIs in `image_url` content blocks
|
||||
of the recent messages (works for vision-capable models), (2) a local
|
||||
filesystem path on the file dict (open-webui stores uploads under
|
||||
/app/backend/data/uploads/), (3) the file's url field, fetched over
|
||||
HTTP. Returns raw image bytes, or None if nothing matched.
|
||||
Find the most recent image in the chat — including images previously
|
||||
emitted by this tool itself. Search order (most recent first):
|
||||
|
||||
1. Inline base64 data URIs in `image_url` content blocks of recent
|
||||
messages (vision-model uploads, paste-from-clipboard).
|
||||
2. Files attached to messages in the conversation, scanned in
|
||||
REVERSE so the newest image wins. This covers two cases:
|
||||
a. Files the user just attached (current user message).
|
||||
b. Files the assistant emitted via prior `generate_image` /
|
||||
`edit_image` calls (attached to assistant messages by the
|
||||
`files` event in _push_image_to_chat).
|
||||
3. The __files__ tool param as a final fallback (some Open WebUI
|
||||
versions pass user uploads here instead of on the message).
|
||||
4. Best-effort URL fetch on any leftover file dict (likely fails
|
||||
on auth-protected endpoints — last resort).
|
||||
"""
|
||||
# Messages: standard OpenAI image_url content blocks.
|
||||
# 1. Inline data URIs on recent messages.
|
||||
for msg in reversed(messages or []):
|
||||
content = msg.get("content") if isinstance(msg, dict) else None
|
||||
if isinstance(content, list):
|
||||
@@ -365,27 +440,62 @@ async def _extract_attached_image(
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Files: try local path, then URL.
|
||||
# 2. Files on messages, newest first.
|
||||
for msg in reversed(messages or []):
|
||||
if not isinstance(msg, dict):
|
||||
continue
|
||||
msg_files = msg.get("files")
|
||||
if not isinstance(msg_files, list):
|
||||
continue
|
||||
for f in msg_files:
|
||||
if not isinstance(f, dict) or not _file_dict_is_image(f):
|
||||
continue
|
||||
data = _read_file_dict(f)
|
||||
if data is not None:
|
||||
return data
|
||||
|
||||
# 3. __files__ param (current user upload, sometimes only here).
|
||||
for f in files or []:
|
||||
if not isinstance(f, dict):
|
||||
continue
|
||||
ftype = (f.get("type") or "").lower()
|
||||
fname = (f.get("name") or f.get("filename") or "").lower()
|
||||
is_image = "image" in ftype or fname.endswith((".png", ".jpg", ".jpeg", ".webp"))
|
||||
if not is_image:
|
||||
if not isinstance(f, dict) or not _file_dict_is_image(f):
|
||||
continue
|
||||
data = _read_file_dict(f)
|
||||
if data is not None:
|
||||
return data
|
||||
|
||||
for path_key in ("path", "filepath", "file_path"):
|
||||
path = f.get(path_key)
|
||||
if path:
|
||||
try:
|
||||
with open(path, "rb") as fh:
|
||||
return fh.read()
|
||||
except OSError:
|
||||
pass
|
||||
# 4. Pull the chat from the database directly. Open WebUI persists
|
||||
# `files` on every message via the upsert in socket/main.py — so even
|
||||
# if __messages__ doesn't hydrate the assistant-emitted attachments,
|
||||
# the chat record does. This is the strongest fallback.
|
||||
if _OPENWEBUI_RUNTIME and metadata:
|
||||
chat_id = metadata.get("chat_id")
|
||||
if chat_id:
|
||||
try:
|
||||
chat = Chats.get_chat_by_id(chat_id)
|
||||
chat_data = getattr(chat, "chat", None) if chat else None
|
||||
chat_messages = (chat_data or {}).get("messages", []) if isinstance(chat_data, dict) else []
|
||||
for msg in reversed(chat_messages):
|
||||
if not isinstance(msg, dict):
|
||||
continue
|
||||
msg_files = msg.get("files") or []
|
||||
for f in msg_files:
|
||||
if not isinstance(f, dict) or not _file_dict_is_image(f):
|
||||
continue
|
||||
data = _read_file_dict(f)
|
||||
if data is not None:
|
||||
return data
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
url = f.get("url")
|
||||
if url:
|
||||
# 5. Last-resort URL fetch (no auth — only works for public endpoints).
|
||||
for source in [files or []] + [
|
||||
(msg.get("files") or []) for msg in reversed(messages or []) if isinstance(msg, dict)
|
||||
]:
|
||||
for f in source:
|
||||
if not isinstance(f, dict) or not _file_dict_is_image(f):
|
||||
continue
|
||||
url = f.get("url")
|
||||
if not url:
|
||||
continue
|
||||
full = url if url.startswith("http") else f"http://localhost:8080{url}"
|
||||
try:
|
||||
async with session.get(full) as resp:
|
||||
@@ -645,7 +755,7 @@ class Tools:
|
||||
|
||||
async def edit_image(
|
||||
self,
|
||||
edit_instruction: str,
|
||||
prompt: str,
|
||||
style: Optional[StyleName] = None,
|
||||
mask_text: Optional[str] = None,
|
||||
denoise: Optional[float] = None,
|
||||
@@ -689,7 +799,7 @@ class Tools:
|
||||
|
||||
Pick `style` for the DESIRED OUTPUT, not the input image.
|
||||
|
||||
:param edit_instruction: What the changed area should look like.
|
||||
:param prompt: What the changed area should look like.
|
||||
Tool auto-prepends quality tags — don't include those.
|
||||
:param style: One of the StyleName values. Omit to auto-detect.
|
||||
:param mask_text: Noun phrase describing the region to edit. Set
|
||||
@@ -700,7 +810,7 @@ class Tools:
|
||||
:param seed: 0 to randomize, otherwise specific.
|
||||
:return: Markdown image of the result, or an error if no image is attached.
|
||||
"""
|
||||
chosen = style or _route_style(edit_instruction)
|
||||
chosen = style or _route_style(prompt)
|
||||
settings = STYLES.get(chosen)
|
||||
if not settings:
|
||||
return f"Unknown style '{chosen}'. Available: {', '.join(STYLES.keys())}"
|
||||
@@ -722,12 +832,24 @@ class Tools:
|
||||
|
||||
async with aiohttp.ClientSession() as session:
|
||||
await emit("Looking for attached image…")
|
||||
raw_in = await _extract_attached_image(__files__, __messages__, session)
|
||||
raw_in = await _extract_attached_image(
|
||||
__files__, __messages__, __metadata__, session,
|
||||
)
|
||||
if raw_in is None:
|
||||
msgs_with_files = sum(
|
||||
1 for m in (__messages__ or [])
|
||||
if isinstance(m, dict) and m.get("files")
|
||||
)
|
||||
chat_id_present = bool((__metadata__ or {}).get("chat_id"))
|
||||
return (
|
||||
"No image found in the chat. Ask the user to attach the "
|
||||
"image they want edited (paperclip / drag-drop), or call "
|
||||
"generate_image instead if they want a new image."
|
||||
"No image found in the chat. Diagnostics: "
|
||||
f"__files__={len(__files__ or [])}, "
|
||||
f"__messages__={len(__messages__ or [])} "
|
||||
f"(of which {msgs_with_files} had a files field), "
|
||||
f"chat_id_present={chat_id_present}, "
|
||||
f"openwebui_runtime={_OPENWEBUI_RUNTIME}. "
|
||||
"Ask the user to attach the image they want edited "
|
||||
"(paperclip / drag-drop), or call generate_image instead."
|
||||
)
|
||||
|
||||
await emit("Uploading source to ComfyUI…")
|
||||
@@ -741,7 +863,7 @@ class Tools:
|
||||
+ (f", mask='{mask_text}'" if mask_text else "")
|
||||
)
|
||||
|
||||
positive = f"{settings['prefix']}{edit_instruction}"
|
||||
positive = f"{settings['prefix']}{prompt}"
|
||||
negative = settings["negative"]
|
||||
if negative_prompt:
|
||||
negative = f"{negative}, {negative_prompt}"
|
||||
|
||||
@@ -18,4 +18,11 @@ if [ -d /opt/comfyui/custom_nodes ]; then
|
||||
done
|
||||
fi
|
||||
|
||||
# Force-pin known-incompatible packages back into a working range. Some
|
||||
# custom nodes bring transformers >=5 transitively, which removes
|
||||
# BertModel.get_head_mask and breaks comfyui_segment_anything's
|
||||
# GroundingDINO. Run last so it wins over anything the loop above
|
||||
# installed.
|
||||
pip install -q "transformers>=4.40,<5" || echo "[entrypoint] transformers pin failed — continuing"
|
||||
|
||||
exec "$@"
|
||||
|
||||
Reference in New Issue
Block a user