Architecture

The Rust core

obleth splits cleanly into a control plane (configuration, keys, dashboard) and a data plane (the hot path). A reverse proxy (HAProxy, your Ingress controller, or any load balancer) fans out to horizontally scaled gateway pods; each pod enforces tenant policy before traffic reaches vLLM or Aibrix.

Rust

Data plane

OpenAI-compatible proxy with streaming completions

3

Listeners

Proxy · Management API · Prometheus metrics

WFQ

Fairshare

Group or tenant-weighted admission under saturation

Fail-open

Resilience

Dependencies down? Requests still flow

System map

Control plane & data plane

Control plane
  • Management API — tenants, keys, routes
  • Control-plane dashboard
  • Postgres as configuration source of truth
Data plane
  • OpenAI-compatible proxy (in-cluster)
  • Fairshare admission + token budgets
  • Prometheus /metrics on each pod

Three datastores

Postgres

Config & audit

Redis

Hot path — keys, cache, budgets

ClickHouse

Usage ledger (optional OLAP)

Streaming proxy

Streaming data path

Completions stream over SSE. Token budgets are reserved at admission and reconciled when the stream completes. Cacheable responses are buffered only up to the configured safety cap.

hop

Clients

OpenAI-compatible API

hop

Load balancer

TLS · round-robin

gateway

obleth

data plane

hop

Backends

vLLM · Aibrix

Inside the data plane

01
resolve_keyAPI key → tenant (moka · Redis)
02
cache_lookupresponse cache check
03
reserve_budgettoken budget · fairshare admit
04
upstream_requeststream SSE to backend
05
reconcileusage · telemetry flush

Next · Performance

Measure what matters

Histograms, queue gauges, cache counters, and optional OTLP traces — all documented with scrape paths and span names.

Open performance