The fairshare layer
for multi-tenant AI.
obleth sits between your edge load balancer and inference backends, admitting tenants fairly under contention before requests reach vLLM, Aibrix, or another OpenAI-compatible upstream.
Section 01 // Admission
Fairness before routing
obleth puts a tenant-aware admission decision in front of model routing. The shared pool stays fair when demand spikes, and the backend only sees work that has already earned a slot.
Caller
Resolve the tenant
The incoming key becomes tenant context before the request can reach a model backend.
Admission
Choose the next slot
Under contention, obleth selects work by fairshare so one workload cannot run away with the pool.
Backend
Stream with context
The admitted request streams upstream while the same tenant context follows the result.
The practical effect: production traffic keeps breathing room, batch work still advances, and noisy tenants stop dominating simply because they arrived first.
Fairshare docsSection 02 // Core architecture
Hot-path responsibilities
obleth keeps policy decisions in front of the inference layer: identify the caller, decide admission, route the request, and record the outcome.
Fairshare
A single scheduler bounds global in-flight work and grants queued requests by fairshare policy when the cluster is saturated.
Policy context
Every API key resolves to tenant weight, quota, fairshare group, and disabled state before upstream work begins.
Routing
OpenAI-compatible requests stream through to model backends, with optional exact-match response caching and MCP routes.
Section 03 // Operations
Built for shared clusters
Fair under load
Admission stays proportional when tenants compete for the same GPU pool.
Tenant-aware
Keys, weights, quotas, and usage records stay tied to the tenant on every request.
Operator-visible
Metrics, dashboard views, and ledger rows show what happened under contention.
Pricing
Run it yourself. Bring us in when it matters.
obleth is source-available with no license fee for your own workloads. Voxel provides optional paid help for production rollout, capacity planning, and platform integration.
Software
$0
Run the gateway for your own workloads. No metered product fee from obleth.
Services
Optional
Architecture review, hardening, load testing, and custom integration support from Voxel.