How classic deficit round-robin maps to obleth's share_score fairshare admission.
Product term: the control plane and
GET /api/v1/fairshare/liveexposeshare_score(served_tokens / weight). Lower score means more behind fair share and admitted next. This page explains the DRR lineage; day-to-day operations use Fairshare Engine terminology.
obleth's admission scheduler adapts Deficit Round Robin (DRR) to concurrent, token-measured requests instead of packet queues. The observable metric is share_score, not a per-tenant deficit counter.
In the original DRR algorithm, each queue has a deficit counter. In each round, a queue can send up to quantum + deficit bytes. If it sends less, the remainder carries into the next round. Larger quanta produce proportionally more bandwidth while preserving starvation freedom.
obleth applies the same fairness principle to admission. "Sending" means holding a concurrency permit for the duration of a request.
For each active tenant, obleth tracks served_tokens, the estimated token cost of admitted requests. The admission score is:
share_score = served_tokens / weight
When a concurrency slot opens, it goes to the queued tenant with the lowest share_score. A tenant with weight=500 gets roughly ten times the opportunity of one with weight=50 because its score grows ten times more slowly per token.
With OBLETH_FAIRSHARE_ALGORITHM=hierarchical, obleth first partitions global concurrency among fairshare groups:
slots_for_group = floor(group_weight / sum(group_weights) * max_in_flight)
Fractional remainders are distributed largest-first so the total equals max_in_flight, and every active group gets at least one slot if capacity allows.
Within each group, slots are distributed among that group's tenants using the tenant-level fairshare score.
max_in_flight = 8
groups: prod (weight=500), dev (weight=50)
total_weight = 550
prod: floor(8 * 500 / 550) = floor(7.27) = 7
dev: floor(8 * 50 / 550) = floor(0.72) = 0 -> bumped to 1
sum = 8
The min-one guarantee means every active group eventually gets capacity.
Every tenant with a non-zero weight eventually gets admitted. The served counter grows for every tenant that gets slots. A low-weight tenant whose score is lowest will be selected once higher-weight tenants have received enough service to bring their scores above it.
Idle tenants are not in the queue, so they cannot bank unused credit. When they re-enter, they start from the current virtual-time baseline.
Ctl::Admit { req, respond, enqueued } to the scheduler channel.in_flight < max_in_flight, the permit is granted immediately.Ctl::Release.served / weight score.brownout_wait, it is admitted with brownout limits.served[tenant] += cost is updated.All of this runs on a single async task, so admission order is deterministic and avoids lock races.
The benchmark harness seeds two tenants with weight 500 and 50, lets the lower-weight api-batch tenant flood first, then starts the boosted chatbot tenant while the scheduler is already under contention.
node bench/run-benchmark.mjs
The command passes when chatbot makes visible progress after joining, api-batch is not starved, and ClickHouse usage stays close to client-observed completions. See Benchmark Harness for the full walkthrough.