Files
comfyui-nvidia/deployments/ai-stack/docker-compose.yml
William Gill 02a4bece5d Ollama: keep loaded models resident until evicted (KEEP_ALIVE=-1)
Was 30m, which evicts after 30 minutes of inactivity and forces a
reload penalty on the next request. Setting -1 holds models in VRAM
indefinitely; MAX_LOADED_MODELS=3 caps how many can stay resident
simultaneously (vs the previous 2). Tune MAX higher if you're
rotating between more than three models AND your GPU has the VRAM
for it — comment in the compose explains the trade-off.

For the live srvno.de stack: OLLAMA_KEEP_ALIVE=-1 takes effect on
the next `docker compose up -d ollama`. Loaded models survive the
restart only if they're re-requested before swap-out anyway.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 19:25:42 -05:00

7.4 KiB