56 docs indexed
Persisted model health checks, manual probes, maintenance windows, and Slack alerts.
Model health checks judge each registered route without spending tokens. For
every model obleth first looks for a passive signal — recent real client
traffic in the ClickHouse usage ledger — and only falls back to an active
liveness probe (GET {api_base}/models) when a model has seen no traffic. The
probe is token-free and never runs a real inference request, so checking a model
never consumes a provider budget.
A healthy result covers the parts that matter to operators:
upstream_model (when its
/v1/models list enumerates models).Transient conditions — an overloaded upstream, a single network blip, or a
/v1/models endpoint that doesn't list the model — are classified as degraded
rather than unhealthy, so a working model doesn't flap to "down" and fire false
alerts.
| Source | When used | Result |
|---|---|---|
| Passive (ClickHouse usage, last 300s) | Model has recent client traffic | healthy if any 2xx; unhealthy if only 5xx; inconclusive if only 4xx (falls through to probe) |
Active liveness (GET {api_base}/models) | No recent traffic, or passive was inconclusive | healthy/degraded/unhealthy from the probe response |
The active probe retries once on 408/429/5xx or a network error before
recording a result. A 2xx that lists the model is healthy; a 2xx that
doesn't advertise it is degraded (many shared gateways omit models from
/v1/models); 401/403 is unhealthy; other inconclusive responses are
degraded.
When a model defines multiple endpoints,
the worker probes each enabled endpoint independently with the same
GET {api_base}/models liveness check and records a per-endpoint
health_status. The data plane only routes to endpoints that are enabled and not
explicitly unhealthy, so a dead cluster drops out of rotation on its own while the
rest of the model keeps serving. The model is reported fully down only when every
endpoint is unhealthy.
Open Models in the control plane.
The compact table shows each production model's latest health status and last check latency. Click a row to expand details:
Benchmark fixture endpoints such as benchmark-endpoint are hidden by default.
Use Show benchmark when you need to inspect or check them.
Run one check:
curl -X POST http://localhost:9180/api/v1/models/$MODEL_ID/health/check \
-H "Authorization: Bearer $TOKEN"
Run checks for eligible models:
curl -X POST http://localhost:9180/api/v1/models/health/check \
-H "Authorization: Bearer $TOKEN"
Bulk checks skip routes that are disabled, hidden for maintenance, or have checks disabled.
The worker is enabled by default:
OBLETH_MODEL_HEALTH_ENABLED=true
OBLETH_MODEL_HEALTH_INTERVAL_SECS=900
OBLETH_MODEL_HEALTH_TIMEOUT_SECS=30
OBLETH_MODEL_HEALTH_RETENTION_DAYS=30
Checks are claimed with Postgres row locking, so multiple Management API replicas can run safely without a single in-memory leader. Per-model jitter keeps large fleets from checking every route at once.
Slack alerts use the existing incoming webhook config:
OBLETH_SLACK_WEBHOOK_URL=https://hooks.slack.com/services/...
OBLETH_SLACK_ALERT_MIN_INTERVAL_SECS=300
Each model has its own failure threshold, default 2. obleth sends one down
alert when the threshold is reached and a recovery alert when the model becomes
healthy again.
Alerts are suppressed when:
Failed checks are still stored during maintenance, so the history remains useful after the window ends.