Was 30m, which evicts after 30 minutes of inactivity and forces a
reload penalty on the next request. Setting -1 holds models in VRAM
indefinitely; MAX_LOADED_MODELS=3 caps how many can stay resident
simultaneously (vs the previous 2). Tune MAX higher if you're
rotating between more than three models AND your GPU has the VRAM
for it — comment in the compose explains the trade-off.
For the live srvno.de stack: OLLAMA_KEEP_ALIVE=-1 takes effect on
the next `docker compose up -d ollama`. Loaded models survive the
restart only if they're re-requested before swap-out anyway.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Floating tags (`latest`, `main`) made deploys non-deterministic — a
container recreate could pull a newer Open WebUI, Ollama, or Anubis at
any time. Wrapped every image: src in a ${VAR:-default} substitution
and surfaced the full set in .env.example with a header explaining
where to find current versions and bumped COMFYUI_IMAGE_TAG default
to 0.2.1 (the just-tagged version with the transformers pin).
Vars added: CADDY_TAG, OLLAMA_TAG, OPEN_WEBUI_TAG, ALPINE_TAG,
ANUBIS_TAG (COMFYUI_IMAGE_TAG already existed). Defaults match the
previous floating-tag behaviour for ones I'm not confident which
specific version to pin (Ollama, Open WebUI, Anubis) — operator should
update those to verified versions for production deploys.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three pieces:
1. mirror-ollama-model.sh — run on any machine that has the model
pulled. Parses the manifest at
~/.ollama/models/manifests/registry.ollama.ai/<ns>/<name>/<tag>,
greps every sha256:* digest, tars manifest + referenced blobs into
one .tgz. Output is portable — extract over any other Ollama
data dir and the model is immediately visible.
2. init-models.sh gains an s3_pull function that curls a tarball from
$S3_OLLAMA_BASE and extracts into /root/.ollama/models/. Falls back
to ollama pull when S3_OLLAMA_BASE is unset, so s3_pull lines are
safe to commit before the bucket is ready. huihui_ai/qwen3.5-
abliterated:9b promoted to s3_pull as the example.
3. docker-compose.yml model-init service propagates S3_OLLAMA_BASE
from .env. Curl auto-installs at script start because ollama/ollama
doesn't always ship it.
README documents the mirror workflow under "Mirroring models to S3".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the Ollama model-init pattern: a one-shot Alpine container that
mounts the comfyui-models volume and runs comfyui-init-models.sh, which
curls direct download URLs (HuggingFace by default) into the right
subdirectories. Idempotent — already-present files are skipped.
HF_TOKEN is plumbed through for gated repos (Flux-dev, SD3, etc.) and is
opt-in via .env. The default list ships SD 1.5 only, matching the
placeholder filename in workflows/*.json. Examples for SDXL, Flux, and
upscalers are commented in the script.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
So changing the deployment's hostnames is a one-file edit (.env) instead
of touching docker-compose.yml. WEBUI_URL is the full URL with scheme
(Open WebUI uses it for auth redirects); LLM_URL is the bare hostname
(Anubis wants it for COOKIE_DOMAIN).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sanitized snapshot of the live srvno.de stack: Caddy + Ollama (with
preseed) + ComfyUI + Open WebUI + Anubis stub. Real hostnames,
secrets, and bcrypt hash replaced with placeholders so the dir is safe
to commit.
Caddyfile updated to point at comfyui:8188 (the source file pointed at
the now-removed forge service). Dropped FIGMENT_/FORGE_/SEGMENT_IMAGE_TAG
from the env example. Harmonised the init-models.sh mount path between
ollama and model-init services.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>