How obleth is structured: the data plane, management API, three datastores, and how they connect.
obleth is a single Rust binary that boots three listeners and connects four services at startup.
┌─────────────────────────────────────────────────────────────┐
│ Clients │
└──────────────┬──────────────────────────────────────────────┘
│ HTTPS
┌──────────────▼──────────────────────────────────────────────┐
│ HAProxy :80 (TLS termination + round-robin across pods) │
└──────────────┬──────────────────────────────────────────────┘
│ HTTP
┌──────────────▼──────────────────────────────────────────────┐
│ obleth (Rust, single binary) │
│ │
│ :8080 Data plane :9090 Management API │
│ :9091 Prometheus metrics │
└──────┬──────────┬──────────────────────────────────────────┘
│ │
│ ┌─────▼──────────────────────────────────┐
│ │ Control plane (Next.js dashboard) │
│ │ reads/writes Management API only │
│ └────────────────────────────────────────┘
│
┌──────▼──────────────────────────────────────────────────────┐
│ Aibrix gateway router → vLLM replicas │
└─────────────────────────────────────────────────────────────┘
:8080Handles all inference traffic. Every request runs the full pipeline:
auth → cache check → fairshare admit → budget reserve → upstream proxy → reconcile → telemetry
This is the OpenAI-compatible surface. Client keys (sk_...) authenticate here.
:9090The versioned config and analytics surface. Authenticated separately with an admin bearer token. The Next.js dashboard and any CLI or Terraform consume this API exclusively — nothing touches the datastores directly.
All config writes follow a single path: Postgres → Redis → pub/sub invalidate. This means every gateway pod's in-process cache is invalidated immediately when a key is created, deleted, or weight-changed.
:9091Always-on /metrics endpoint. Low-cardinality labels only (admission class, status class). Per-tenant breakdowns live in ClickHouse, not Prometheus, to avoid label explosion.
Holds tenants, API keys, model routes, MCP server registrations, fairshare groups, quotas, and the full audit log. This is the durable, queryable, relational backbone. OLTP workload. Never on the request hot path.
The data plane reads only Redis:
obleth:key:{sha256_hex} → ResolvedKey JSON (tenant, weight, quota, group)obleth:model:{name} → ResolvedModel JSONobleth:budget:{tenant_id} → token-bucket state (Lua-atomic)obleth:cache:{sha256_hex} → cached responseobleth:invalidate channel triggers moka cache eviction on every podAppend-only. Every completed request inserts one row asynchronously via a bounded channel and a 1-second flush loop. Never blocks a request. If ClickHouse is unavailable, rows spill to a local WAL file and replay once it recovers.
Key resolution on a busy gateway traverses three caches before hitting Postgres:
Request with Bearer token
→ moka (in-process, TTL=5min, cap=100k keys) ← fastest, no network
→ Redis (sub-ms, shared across pods) ← fast, network
→ (Postgres only on cold miss or explicit sync) ← durable, off hot path
See Request Lifecycle for a step-by-step walkthrough.
A single Tokio async task owns all admission state — no lock races, deterministic order. See Fairshare Engine for how the scheduler works.
All configuration mutations follow one path:
Management API handler
→ write to Postgres (durable, audited)
→ sync to Redis (cache)
→ publish to obleth:invalidate (invalidate moka on all pods)
This ensures there is exactly one validated code path mutating configuration.