Common obleth errors and how to diagnose them: auth failures, 503s, stalled queues, Redis issues, and ClickHouse connectivity.
RUST_LOG=obleth=debug
This adds verbose output for request routing, cache hits/misses, Redis operations, and fairshare decisions.
| Status | Meaning | Common cause |
|---|---|---|
401 Unauthorized | Missing or malformed API key | No Authorization header; key doesn't start with sk_ |
403 Forbidden | Key is disabled or tenant not found | Key was deleted/disabled; tenant deleted |
404 Not Found | Model not found or not enabled | Model not in registry, or enabled=false |
429 Too Many Requests | Tenant quota exceeded | tokens_per_minute budget exhausted for this billing window |
503 Service Unavailable | Request brownout or queue full | System is overloaded; obleth_queue_depth is high |
502 Bad Gateway | Upstream returned an error | Inference backend is down or returned a non-2xx response |
# 1. Confirm the key exists
curl http://localhost:9090/api/v1/keys \
-H "Authorization: Bearer $TOKEN" | jq '.[] | select(.key_prefix == "sk_abc1")'
# 2. Confirm it's not disabled
# If "disabled": true, re-enable:
curl -X PUT http://localhost:9090/api/v1/keys/$KEY_ID/disabled \
-H "Authorization: Bearer $TOKEN" \
-d '{"disabled": false}'
# 3. Check the key format: it must be the full secret (sk_...), not the prefix
Key resolution flow: the gateway hashes the full key with SHA-256, then looks up obleth:key:{hash} in Redis. If Redis is down, it falls back to moka, then Postgres. If all three miss, the request is rejected.
# List models
curl http://localhost:9090/api/v1/models \
-H "Authorization: Bearer $TOKEN" | jq '.[] | {name: .model_name, enabled: .enabled}'
The model field in the request body must match a model_name in the registry exactly. If the model exists but enabled=false, re-enable it.
# Check tenant quota and current budget
curl http://localhost:9090/api/v1/tenants/$TENANT_ID \
-H "Authorization: Bearer $TOKEN" | jq '{tpm: .tokens_per_minute, group: .group_name}'
# Increase quota
curl -X PUT http://localhost:9090/api/v1/tenants/$TENANT_ID/quota \
-H "Authorization: Bearer $TOKEN" \
-d '{"tokens_per_minute": 200000}'
The token budget resets every 60 seconds via the Redis Lua script. If a tenant is consistently hitting 429, either increase their quota or check if they have a runaway workload.
The queue fills when OBLETH_GLOBAL_MAX_IN_FLIGHT is too low for the offered load. Check:
# Live stats
curl http://localhost:9090/api/v1/stats -H "Authorization: Bearer $TOKEN"
# Look at: in_flight, queued, max_in_flight
# Prometheus
curl http://localhost:9091/metrics | grep 'obleth_queue_depth\|obleth_in_flight'
Options:
OBLETH_GLOBAL_MAX_IN_FLIGHT (only if backend can handle more concurrency)OBLETH_BROWNOUT_WAIT_MS to fail faster instead of queuingobleth fails open when Redis is unavailable. If you see repeated log lines like:
WARN obleth_redis: Redis error: Connection refused
WARN obleth_proxy: budget reserve failed; failing open
Check:
docker logs obleth-redis-1
redis-cli -h $REDIS_HOST ping
When Redis reconnects, obleth resumes normal operation automatically. Check OBLETH_REDIS_URL is correct. The scheme must be redis:// (not rediss:// unless TLS is configured).
WARN obleth_telemetry: clickhouse insert failed; spilling N records to WAL
This is non-fatal — requests continue. Check:
curl http://$CLICKHOUSE_HOST:8123/ping
# Expected: Ok.
Common causes: wrong OBLETH_CLICKHOUSE_URL, wrong credentials, ClickHouse not started. After connectivity is restored, obleth replays the WAL automatically.
curl http://localhost:9090/api/v1/health
# {"status": "ok", "redis": "ok", "postgres": "ok"}
If redis or postgres shows an error, check those services. clickhouse is not included in the health check (fail-open design).