Benchmark Harness

Run one end-to-end benchmark that seeds tenants, applies load, samples fairshare state, and verifies the usage ledger.

The benchmark harness is a reproducible proof of obleth's fairshare claim: a low-weight workload can flood the gateway first, a boosted workload can join later, and both tenants still make progress under contention.

Prerequisites: Node.js 18+ and the Docker Compose stack running.

Run the benchmark

cd obleth-gateway
node bench/run-benchmark.mjs

The command:

  1. Registers mock-model for local mock runs, or verifies that your real MODEL is already registered.
  2. Creates/updates fairshare groups and tenants: chatbot weight 500, api-batch weight 50.
  3. Mints fresh sk_* tenant keys.
  4. Sets the live gateway capacity with PUT /api/v1/capacity.
  5. Runs staggered load: api-batch starts first, then chatbot joins.
  6. Samples GET /api/v1/fairshare/live.
  7. Verifies client completions against ClickHouse usage and exits non-zero on failure.

Generated artifacts are written outside the repo by default:

FileDefault path
Tenant keys/tmp/obleth-bench/keys.json
Run summary/tmp/obleth-bench/run-meta.json
Fairshare samples/tmp/obleth-bench/fairshare-samples.jsonl

Real backend run

Register the model in the control plane first, then set MODEL to the registered model name:

CAPACITY=16 \
DURATION_S=120 \
OUTPUT_TOKENS=150 \
MODEL=gemma4-31b-it \
CONC=64 \
PROXY_BASE=http://localhost \
node bench/run-benchmark.mjs

For a short smoke run:

CAPACITY=8 DURATION_S=30 OUTPUT_TOKENS=32 MODEL=gemma4-31b-it node bench/run-benchmark.mjs

Environment variables

VariableDefaultEffect
ADMIN_BASEhttp://localhost:9090Management API base URL
ADMIN_TOKENdev-admin-tokenManagement API bearer token
PROXY_BASEhttp://localhostData-plane base URL
MODELmock-modelRegistered model name
CAPACITY8Live global in-flight limit
DURATION_S60Overlap duration after all tenants have joined
STAGGER_CHATBOT_S10Seconds api-batch floods before chatbot joins
CONC32Worker count per active tenant
OUTPUT_TOKENS150max_tokens per request
MIN_COMPLETION_RATIO2Minimum chatbot/api-batch overlap completion ratio
MAX_ERROR_RATE0.05Maximum client error rate per tenant
LEDGER_TOLERANCE0.2Allowed ClickHouse/client completion delta
REQUIRE_SATURATION1Set 0 for light smoke runs
CHAOSunsetSet 1 to pause ClickHouse and Redis mid-run
CONTAINER_CLIdockerUse podman for Podman Compose
BENCH_OUT_DIR/tmp/obleth-benchOutput directory

Interpreting results

PASS means:

  • both tenants completed requests during the overlap window;
  • api-batch was not starved;
  • chatbot completed materially more overlap work than api-batch;
  • the scheduler was saturated or live samples showed active contention;
  • ClickHouse usage was close to client-observed completions; and
  • client errors stayed under the configured threshold.

The old split scripts were easy to misuse because setup, load, ledger checks, and chaos were separate. run-benchmark.mjs keeps those pieces in one path so the benchmark is harder to accidentally bend into a misleading result.