open source · free · GPU-native
Save per-model settings as profiles.
Pick one in the TUI and hit Enter to spin it up or down.
the problem
01
Different models need different GPU settings, memory limits, and serving params. You end up copy-pasting docker commands every time.
$ docker run --gpus ... -e ... vllm/vllm ...02
Testing Qwen, then Llama, then DeepSeek? Each switch means stopping, reconfiguring, and restarting manually.
$ docker stop && docker rm && docker run ...03
Which container is running? How much GPU memory is left? You're constantly running docker ps and nvidia-smi.
$ docker ps && nvidia-smi04
Running multiple models simultaneously means juggling ports, GPU assignments, and compose files by hand.
$ vim docker-compose.yaml # again...how it works
Set your HuggingFace token and cache path. One-time setup.
Quick Setup auto-generates a profile from just a model name. Or create one manually with full control.
Pick a profile, press Enter, choose Start. Your model is serving on the OpenAI-compatible API.
demo
terminal
before & after
features
Start, stop, view logs, edit configs — all from one terminal screen with keyboard shortcuts.
Save per-model settings independently. Switch between Qwen, Llama, DeepSeek instantly.
Real-time GPU usage bars on the dashboard. Auto-refresh every 5 seconds. No more nvidia-smi.
Estimate GPU memory before deploying. Know if your model fits before wasting time on OOM errors.
Auto-detect GPU architecture. Fast Build in 10-30 min. Support for forks and custom versions.
Multi-adapter loading with automatic path mapping. Serve fine-tuned models alongside base models.
faq
.env.common, and run uv run vllm-compose.
Quick Setup auto-generates a profile from just a model name — you'll be serving in 30 seconds.
uv for Python package management but pip works too.
./run.sh build main --repo <your-fork-url>.
It auto-detects your GPU architecture for optimized builds.