obleth can serve identical requests from an exact-match response cache instead of
obleth can serve identical requests from an exact-match response cache instead of hitting the upstream. A cache hit skips admission, the token budget, and the upstream entirely — it returns immediately and costs nothing against fairshare or quota. This is the cleanest possible offload: real requests removed from the backend, not just smoothed.
The cache is opt-in per model and off by default.
auth → resolve model → cache check ─ hit ─→ return cached response (no permit, no budget)
└ miss ─→ fairshare admit → reserve budget → upstream → store on success
sha256(model_name + request_body). Two requests collide
(a hit) only when the client-facing model and the full request body are
byte-identical.200 OK responses are stored.Per model, from the dashboard (Models → Cache toggle) or the Management API:
curl -X PUT "${OBLETH_ADMIN_BASE_URL}/api/v1/models/$MODEL_ID/cache" \
-H "Authorization: Bearer $OBLETH_ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"cache_enabled": true, "cache_ttl_secs": 300}'
cache_enabled — turn the cache on/off for this model.cache_ttl_secs — entry lifetime in seconds (0 = no expiry).Changes propagate to every gateway pod immediately via the Redis invalidation channel.
Each request records a cache_status of hit, miss, or off in the
ClickHouse usage ledger.
Dashboard: the Models page shows a cache panel with 24h hit rate, hits, misses, and tokens saved.
Management API:
curl "${OBLETH_ADMIN_BASE_URL}/api/v1/usage/cache?since_ms=..." \
-H "Authorization: Bearer $OBLETH_ADMIN_TOKEN"
# { "hits": 1234, "misses": 5678, "tokens_saved": 987654 }
Prometheus:
obleth_cache_lookups_total{result="hit|miss"}obleth_cache_tokens_saved_totalThe mock backend exposes GET /stats with a true request counter. Run a load
test twice (cache off, then on) and compare requests against the number of
requests you issued — the delta is exactly the requests the cache absorbed.