Open WebUI was blocking image attachments to the Image Studio model
because mistral-nemo:12b isn't vision-capable. Two changes:
- capabilities.vision flipped to true in the preset JSON. The Tool
only needs the image to make it through __messages__ / __files__
to call edit_image; the actual visual processing happens in
ComfyUI's img2img, not in the LLM. Setting the flag unlocks the
attach-image UI without lying about what mistral-nemo can do.
- System prompt now tells the LLM explicitly: "you may not be able
to visually inspect the attached image — that is fine. Trust the
user's description and call edit_image." Prevents the LLM from
refusing or hedging when it gets an image it can't see.
Documented the upgrade path in image_studio.md for users who want
real vision (qwen2.5vl:7b, llama3.2-vision:11b, minicpm-v:8b — pick
one, add to init-models.sh, swap base_model_id in the preset). The
vision LLM can then write smarter edit_image calls from the image
content rather than the user's description alone.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>