Data Plane Routes

All routes handled by the obleth data plane (:8080), authentication requirements, and passthrough behavior.

The data plane listens on port 8080 and proxies OpenAI-compatible requests to the configured upstream. All routes require a valid API key in the Authorization header.

Route table

MethodPathDescription
POST/v1/chat/completionsChat completions (streaming and non-streaming)
POST/v1/completionsLegacy text completions
POST/v1/embeddingsEmbeddings
GET/v1/modelsList available models (from obleth's model registry)
GET/healthLiveness probe (no auth required)

All other paths are forwarded to the upstream as-is (passthrough). This means vendor-specific extensions like /v1/batch or /v1/files work without any obleth configuration.

Authentication

Every request (except /health) must include:

Authorization: Bearer sk_<48 hex chars>

The key is resolved as described in the Authentication reference.

Request handling

  1. Auth key is resolved → tenant, weight, group, TPM quota looked up
  2. Model name extracted from JSON body → matched against model registry
  3. Token estimate computed from the request body
  4. Response cache checked if enabled for the model
  5. Admission: in-flight check → queue if at capacity → brownout if wait exceeded
  6. Budget checked and reserved atomically via Redis Lua
  7. Request forwarded to upstream with headers:
    • Authorization replaced with model's api_key (if set)
  8. Streaming response streamed back to client
  9. Actual token counts extracted from the response tail when the upstream reports usage
  10. Budget reconciled (reserve → actual delta refunded)
  11. Usage record written to ClickHouse (or WAL)

Header handling

obleth forwards client headers except hop-by-hop, auth, and encoding headers:

  • host
  • content-length
  • authorization
  • x-api-key
  • accept-encoding
  • connection

Response cache

When cache is enabled for a model, the data plane:

  1. Computes SHA-256(model + request_body) before forwarding
  2. Checks obleth:cache:{hash} in Redis
  3. On hit: returns cached response immediately (no upstream call, no fairshare permit, no budget reserve)
  4. On miss: forwards request, then stores successful non-brownout responses up to 512 KiB with the model's cache_ttl_secs

Cache hits include X-Obleth-Cache: hit. Misses continue through the normal streaming path and are recorded in the usage ledger as cache_status = "miss".