Architecture

How obleth is structured: the data plane, management API, three datastores, and how they connect.

obleth is a single Rust binary that boots three listeners and connects four services at startup.

System diagram

┌─────────────────────────────────────────────────────────────┐
│  Clients                                                    │
└──────────────┬──────────────────────────────────────────────┘
               │ HTTPS
┌──────────────▼──────────────────────────────────────────────┐
│  HAProxy  :80  (TLS termination + round-robin across pods)  │
└──────────────┬──────────────────────────────────────────────┘
               │ HTTP
┌──────────────▼──────────────────────────────────────────────┐
│  obleth  (Rust, single binary)                              │
│                                                             │
│  :8080  Data plane        :9090  Management API             │
│  :9091  Prometheus metrics                                  │
└──────┬──────────┬──────────────────────────────────────────┘
       │           │
       │     ┌─────▼──────────────────────────────────┐
       │     │  Control plane  (Next.js dashboard)    │
       │     │  reads/writes Management API only       │
       │     └────────────────────────────────────────┘
┌──────▼──────────────────────────────────────────────────────┐
│  Aibrix gateway router  →  vLLM replicas                    │
└─────────────────────────────────────────────────────────────┘

Three listeners

Data plane :8080

Handles all inference traffic. Every request runs the full pipeline:

auth → cache check → fairshare admit → budget reserve → upstream proxy → reconcile → telemetry

This is the OpenAI-compatible surface. Client keys (sk_...) authenticate here.

Management API :9090

The versioned config and analytics surface. Authenticated separately with an admin bearer token. The Next.js dashboard and any CLI or Terraform consume this API exclusively — nothing touches the datastores directly.

All config writes follow a single path: Postgres → Redis → pub/sub invalidate. This means every gateway pod's in-process cache is invalidated immediately when a key is created, deleted, or weight-changed.

Prometheus metrics :9091

Always-on /metrics endpoint. Low-cardinality labels only (admission class, status class). Per-tenant breakdowns live in ClickHouse, not Prometheus, to avoid label explosion.

Three datastores

Postgres — configuration source of truth

Holds tenants, API keys, model routes, MCP server registrations, fairshare groups, quotas, and the full audit log. This is the durable, queryable, relational backbone. OLTP workload. Never on the request hot path.

Redis — hot cache + atomic budgets

The data plane reads only Redis:

  • Key resolution: obleth:key:{sha256_hex}ResolvedKey JSON (tenant, weight, quota, group)
  • Model resolution: obleth:model:{name}ResolvedModel JSON
  • Token budgets: obleth:budget:{tenant_id} → token-bucket state (Lua-atomic)
  • Response cache: obleth:cache:{sha256_hex} → cached response
  • Pub/sub: obleth:invalidate channel triggers moka cache eviction on every pod

ClickHouse — usage and cost ledger

Append-only. Every completed request inserts one row asynchronously via a bounded channel and a 1-second flush loop. Never blocks a request. If ClickHouse is unavailable, rows spill to a local WAL file and replay once it recovers.

Caching layers (hot path)

Key resolution on a busy gateway traverses three caches before hitting Postgres:

Request with Bearer token
  → moka (in-process, TTL=5min, cap=100k keys)  ← fastest, no network
  → Redis  (sub-ms, shared across pods)          ← fast, network
  → (Postgres only on cold miss or explicit sync) ← durable, off hot path

The data plane pipeline

See Request Lifecycle for a step-by-step walkthrough.

The fairshare scheduler

A single Tokio async task owns all admission state — no lock races, deterministic order. See Fairshare Engine for how the scheduler works.

Write path discipline

All configuration mutations follow one path:

Management API handler
  → write to Postgres (durable, audited)
  → sync to Redis (cache)
  → publish to obleth:invalidate (invalidate moka on all pods)

This ensures there is exactly one validated code path mutating configuration.