Architecture
The Rust core
obleth splits cleanly into a control plane (configuration, keys, dashboard) and a data plane (the hot path). A reverse proxy (HAProxy, your Ingress controller, or any load balancer) fans out to horizontally scaled gateway pods; each pod enforces tenant policy before traffic reaches vLLM or Aibrix.
Rust
Data plane
OpenAI-compatible proxy with streaming completions
3
Listeners
Proxy · Management API · Prometheus metrics
WFQ
Fairshare
Group or tenant-weighted admission under saturation
Fail-open
Resilience
Dependencies down? Requests still flow
System map
Control plane & data plane
- Management API — tenants, keys, routes
- Control-plane dashboard
- Postgres as configuration source of truth
- OpenAI-compatible proxy (in-cluster)
- Fairshare admission + token budgets
- Prometheus /metrics on each pod
Three datastores
Postgres
Config & audit
Redis
Hot path — keys, cache, budgets
ClickHouse
Usage ledger (optional OLAP)
Streaming proxy
Streaming data path
Completions stream over SSE. Token budgets are reserved at admission and reconciled when the stream completes. Cacheable responses are buffered only up to the configured safety cap.
hop
Clients
OpenAI-compatible API
hop
Load balancer
TLS · round-robin
gateway
obleth
data plane
hop
Backends
vLLM · Aibrix
Inside the data plane
Request pipeline
Auth, cache, budget reservation, upstream streaming, and telemetry — each phase is observable separately.
Learn moreModular deploy
Run bundled Postgres/Redis or wire external URLs. Enable ClickHouse and observability profiles when you need them.
Learn moreResponse cache & MCP
Optional Redis-backed response cache and MCP gateway routes without forking your inference stack.
Learn moreNext · Performance
Measure what matters
Histograms, queue gauges, cache counters, and optional OTLP traces — all documented with scrape paths and span names.