A typical agent turn isn't one decision — it's a dozen. Most are cheap (intent labels, distillation, safety checks). Liminal routes those to a fast model and keeps the main model for the work that actually needs depth.
The split
| Role | Env default | Purpose |
|---|---|---|
| Main | deepseek/deepseek-v4-pro | Full ReAct turn — tools, reasoning, final answer |
| Fast | deepseek/deepseek-v4-flash | Intent, distill, query rewrite, critic, safety judge, auto-dream |
Every fast-model call is a full completion — but at a fraction of the price. On long sessions, supporting work is most of the bill.
Prompt caching on top
On providers that support ephemeral cache markers, the static harness prefix (~14k tokens) bills at cache-read rates on rounds 2+ — often ~13× cheaper on prefix tokens for the rest of the loop.
[HARNESS] prompt_cache: cached=12k/14k (86%)
Cache hits are logged per turn in .agent_sessions/*.jsonl. We disable provider fallbacks by default so cache behavior stays predictable.
What it costs in practice
A typical multi-tool refactor (10–15 tool calls, two-tier routing, cache on rounds 2+):
| Component | Approx. cost |
|---|---|
| Main — round 1 (cold prefix) | ~$0.022 |
| Main — rounds 2–12 (cached) | ~$0.011 |
| Fast — intent + distill + safety | ~$0.001 |
| Total | ~$0.034 |
Without routing and caching, the same turn is closer to $0.18. The discount is structural — not a coupon.
Inside the harness →ReAct loop, tools, compression, and recovery — the full architecture post.See routing on your workload
Install Liminal, run a real task, and read cache + intent lines in the session JSONL.
— The Vireon Dynamics team