obleth

About

A gateway for shared GPU systems

obleth is a community project for teams running self-hosted, multi-tenant inference. Research labs, internal platform teams, and anyone migrating off managed APIs to their own hardware — if multiple workloads share one GPU pool, fairshare admission matters more than another routing layer.

Stack position

Client → HAProxy → obleth (:8080)
                    → Aibrix / vLLM

Shared GPUs

Built for

Research labs and platform teams running multi-tenant inference on their own hardware

Composable

Not a router

Sits in front of Aibrix or vLLM — adds identity and fairness, not replica selection

Observable

Every request

Usage, admission class, and token counts show up in the dashboard and usage ledger

3 stores

State model

Postgres for config, Redis for hot-path state, ClickHouse for the usage ledger

Principles

Small, explicit, operational

Fairness is an admission problem

When a shared GPU pool saturates, obleth decides who gets the next slot before work reaches the model backend. Rate limits reject; fairshare queues and weights.

Identity on every request

API keys hash to tenant context in moka then Redis. Raw secrets are never stored. Every usage row in ClickHouse carries tenant, admission class, and token counts.

Compose, don't compete

obleth handles tenant policy; vLLM, Aibrix, or any OpenAI-compatible backend handles model execution and replica routing.

Queue before reject

Under saturation, requests wait in a weighted fairshare queue instead of failing at the door. Token budget exhaustion is the hard limit — not admission contention.

Run it, inspect it, tune it.

Five-minute quick start, then open the dashboard to watch slots, queues, and usage under load — or open a GitHub issue if something doesn't match the docs.

Quick start Dashboard guide GitHub issues