Verify obleth's fail-open behavior and WAL replay by pausing Redis and ClickHouse during the benchmark.
obleth is designed to keep serving requests when Redis or ClickHouse blink. Chaos mode runs the normal benchmark and pauses those services while load is active.
| Scenario | Expected behavior |
|---|---|
| ClickHouse pause | Usage records spill to the local WAL; client requests continue |
| ClickHouse resume | WAL records replay into ClickHouse |
| Redis pause | Auth uses the in-process key cache; budget checks fail open; requests continue |
| Redis resume | Normal Redis-backed auth and budgeting resume |
CHAOS=1 node bench/run-benchmark.mjs
With Podman Compose:
CONTAINER_CLI=podman CHAOS=1 node bench/run-benchmark.mjs
For a longer real-backend run:
CHAOS=1 \
CAPACITY=16 \
DURATION_S=120 \
OUTPUT_TOKENS=150 \
MODEL=gemma4-31b-it \
CONC=64 \
node bench/run-benchmark.mjs
The benchmark still exits non-zero if tenants stop making progress, client error rates exceed the threshold, or ClickHouse usage does not line up with client completions after recovery.
You can also pause services yourself while node bench/run-benchmark.mjs is running.
docker compose -f deploy/docker/docker-compose.yml pause redis
docker logs -f obleth-obleth-1
docker compose -f deploy/docker/docker-compose.yml unpause redis
docker compose -f deploy/docker/docker-compose.yml pause clickhouse
docker exec obleth-obleth-1 ls -lh /tmp/obleth-telemetry.wal
docker compose -f deploy/docker/docker-compose.yml unpause clickhouse
Replace docker with podman if you are using Podman Compose.
After ClickHouse resumes, check that buffered records were inserted:
docker exec -it obleth-clickhouse-1 clickhouse-client \
--user obleth --password obleth \
--query "SELECT count() FROM obleth.usage WHERE ts_ms > $(date -d '10 minutes ago' +%s)000"
The count should include records created during the outage, and obleth logs should report WAL replay.