About

A gateway for shared GPU systems

obleth is built by Voxel LLC for teams operating multi-tenant AI infrastructure. It adds the layer raw inference backends usually leave out: tenant identity, fairshare admission, token-aware budget enforcement, and usage accounting.

Project shape

Rust

data plane

ELv2

source-available license

3

proxy, admin, metrics listeners

OpenAI

compatible API surface

Principles

Small, explicit, operational

Fairness is an admission problem

When a shared GPU pool saturates, obleth decides who gets the next slot before work reaches the model backend.

Identity belongs on every request

Tenant, key, quota, and fairshare context stay explicit through the data plane and usage ledger.

Compose with the inference stack

obleth handles tenant policy; vLLM, Aibrix, or another OpenAI-compatible backend handles model execution and replica routing.

Next

Run it, inspect it, tune it.

Start locally, read the architecture, or reach out for production deployment support.