A typical agent turn isn't one decision — it's a dozen. Most are cheap (intent labels, distillation, safety checks). Liminal routes those to a fast model and keeps the main model for the work that actually needs depth.

The split

RoleEnv defaultPurpose
Maindeepseek/deepseek-v4-proFull ReAct turn — tools, reasoning, final answer
Fastdeepseek/deepseek-v4-flashIntent, distill, query rewrite, critic, safety judge, auto-dream

Every fast-model call is a full completion — but at a fraction of the price. On long sessions, supporting work is most of the bill.

Prompt caching on top

On providers that support ephemeral cache markers, the static harness prefix (~14k tokens) bills at cache-read rates on rounds 2+ — often ~13× cheaper on prefix tokens for the rest of the loop.

[HARNESS] prompt_cache: cached=12k/14k (86%)

Cache hits are logged per turn in .agent_sessions/*.jsonl. We disable provider fallbacks by default so cache behavior stays predictable.

What it costs in practice

A typical multi-tool refactor (10–15 tool calls, two-tier routing, cache on rounds 2+):

ComponentApprox. cost
Main — round 1 (cold prefix)~$0.022
Main — rounds 2–12 (cached)~$0.011
Fast — intent + distill + safety~$0.001
Total~$0.034

Without routing and caching, the same turn is closer to $0.18. The discount is structural — not a coupon.

Inside the harnessReAct loop, tools, compression, and recovery — the full architecture post.

See routing on your workload

Install Liminal, run a real task, and read cache + intent lines in the session JSONL.

— The Vireon Dynamics team