Fairshare admission & slot control

Module 02 - Fairshare admission

Slots under load

When inference saturates, obleth does not immediately spray 429s. A global in-flight cap bounds the pool, then fairshare admission decides which queued request receives the next slot.

Hierarchical mode protects groups with reserved slot caps. Within an eligible group, obleth admits the tenant that has received the least work so peers keep making progress.

The diagram below keeps one visual model: the scheduler loop, an example 8-slot split, and the per-group tenant queue.

Admission model

Every open slot is assigned deliberately.

A single scheduler owns queue order and in-flight counts. This example shows two tenants competing for an eight-slot pool: chatbot has weight 500, while api-batch has weight 50.

Admit

obleth identifies the tenant and policy before routing upstream.

Queue

When the pool is full, requests wait by tenant instead of racing.

Pick

The next open slot goes to the tenant most entitled to it.

Release

The slot returns when the model response finishes streaming.

Hierarchical mode - current runtime default

Group caps keep api-batch alive under a 10x weight gap.

With an eight-slot pool and active groups weighted 500:50, the gateway allocates seven slots to chatbot and one reserved slot to the api group. api-batch still queues, but it does not vanish.

chatbot

group chatbot / weight 500

cap 7 / queued 57

api-batch

group api / weight 50

cap 1 / queued 31

8-slot pool

example split

chatbot

api

chatbot7 slots

api-batch1 slot

Inside a group

Each group gets its own fairshare queue.

After group caps reserve capacity, tenants inside the eligible group are balanced by how much work they have already received. A busy tenant cannot permanently crowd out its peers.

chatbot

served 8.6k

queued 28

ahead

chatbot-2

served 8.4k

queued 29

batch-job

served 9.1k

queued 12

waits

Fast

If capacity is open and no backlog exists, the request is admitted immediately.

Queued

If the pool is full, the request waits until a released permit triggers dispatch.

Brownout

If it waited past the brownout threshold, it is still admitted but marked degraded.

Control your
inference economy.

The production dashboard shows real-time slot counts, queue depth, and group pools - this page mirrors that mental model so you can see how fairshare behaves before deploying.

FAIRSHARE ENGINE DOCS

Slots under load

Every open slot is assigned deliberately.

Group caps keep api-batch alive under a 10x weight gap.

Each group gets its own fairshare queue.

Control yourinference economy.

Control your
inference economy.