About
A gateway for shared GPU systems
obleth is built by Voxel LLC for teams operating multi-tenant AI infrastructure. It adds the layer raw inference backends usually leave out: tenant identity, fairshare admission, token-aware budget enforcement, and usage accounting.
Project shape
Rust
data plane
ELv2
source-available license
3
proxy, admin, metrics listeners
OpenAI
compatible API surface
Principles
Small, explicit, operational
Fairness is an admission problem
When a shared GPU pool saturates, obleth decides who gets the next slot before work reaches the model backend.
Identity belongs on every request
Tenant, key, quota, and fairshare context stay explicit through the data plane and usage ledger.
Compose with the inference stack
obleth handles tenant policy; vLLM, Aibrix, or another OpenAI-compatible backend handles model execution and replica routing.
Next
Run it, inspect it, tune it.
Start locally, read the architecture, or reach out for production deployment support.