Production Checklist

Everything to verify before deploying obleth to production: datastores, secrets, TLS, capacity, monitoring, and backup.

This checklist covers the minimum steps to harden an obleth deployment for production traffic.

Datastores

Postgres: Use a managed service (CloudNativePG, RDS, Cloud SQL) or HA setup. Do not use the bundled Docker Compose Postgres in production.
Redis: Use Redis Sentinel or Redis Cluster for HA. obleth treats Redis as a hot cache — all data is derivable from Postgres — so a Redis failover is recoverable without data loss.
ClickHouse: Use a replicated cluster or a managed ClickHouse service. The WAL provides short-term durability for single-node outages.
All connection URLs set via environment variables or Kubernetes Secrets, not hardcoded.
On Kubernetes: point the chart at pre-created Secrets (obleth.existingSecret / controlPlane.existingSecret) so credentials never enter values files or --set/CLI history. See Helm Values — Pre-created Secrets.

TLS terminated at HAProxy or your Ingress controller — the data plane is HTTP-only.
obleth data plane (:8080) reachable only from HAProxy/Ingress, not from the internet.
Admin port (:9180) not publicly accessible — restrict to internal network or VPN. The admin token is the only auth and is compared in constant time.
Metrics port (:9091) accessible only to your Prometheus scraper.
SSRF policy: default (local-first) mode allows private/LAN api_base targets; link-local and cloud metadata are always blocked. For strict mode (OBLETH_BLOCK_PRIVATE_NETWORKS=1), set OBLETH_ALLOWED_PRIVATE_CIDRS to trusted internal ranges (e.g. 10.0.0.0/8) so cluster Services can be registered.
Control plane served over HTTPS; verify the built-in security headers (CSP, HSTS, X-Frame-Options) and login rate limiting are active.

These are toggles in the Helm chart, defaulted sensibly and tuned for the Production profile.

Restricted Pod Security Standard. The stateless workloads (obleth, control-plane, benchmark-backend) run with runAsNonRoot, seccompProfile: RuntimeDefault, allowPrivilegeEscalation: false, and all capabilities dropped. On by default — verify they aren't overridden.
Spread across nodes. affinity.antiAffinity: soft (or hard on multi-node clusters) so a single node failure can't take the whole data plane down.
PodDisruptionBudget. podDisruptionBudget.enabled: true keeps a minimum number of obleth pods available during node drains and rolling upgrades (rendered only when replicas > 1).
NetworkPolicy (bundled datastores only). If you run the bundled Postgres/Redis/ClickHouse, set networkPolicy.enabled: true to restrict their ports to obleth pods. Requires a CNI that enforces NetworkPolicy; a no-op for external datastores.

Set OBLETH_GLOBAL_MAX_IN_FLIGHT to match real inference backend concurrency (start conservative: 64, increase based on queue depth).
Set per-model max_in_flight for routes that share a backend; use static mode with a manual cap for cloud APIs.
For self-hosted chat/embedding models, run capacity auto-tune in a quiet window before go-live (avoid probing busy production models — the probe sends real upstream load).
Enable HPA in Helm (hpa.enabled=true); set maxReplicas based on traffic patterns.
Remember: each obleth pod has its own independent concurrency budget. 3 pods × 64 = 192 total slots.
Size OBLETH_GLOBAL_MAX_IN_FLIGHT to the upstream's real concurrency so requests queue briefly rather than overwhelming the backend.

Decide on OBLETH_FAIL_OPEN: true (keep serving under Redis failure) or false (strict budget enforcement).
Set OBLETH_WAL_PATH to a persistent volume path (not /tmp). The WAL must survive pod restarts.
Verify the WAL volume has adequate disk space for your traffic volume.

Create your tenant hierarchy and fairshare groups before sending production traffic.
Register all models (POST /api/v1/models or dashboard import) — a fresh Helm install has none. Use api_base ending in /v1 and the bare upstream_model name.
Mint tenant API keys — there is no shared proxy key; each client needs its own sk_... secret.
Set conservative tokens_per_minute quotas initially; increase based on observed usage.
Verify the audit log is recording changes: GET /api/v1/audit.