Quick Start

Get obleth running locally with Docker Compose, mint a tenant API key, and send an OpenAI-compatible chat completion through the gateway.

Get obleth running locally with Docker Compose, mint a tenant API key, and send an OpenAI-compatible chat completion through the gateway.

Time: ~5 minutes
Prerequisites: Docker Desktop (or Docker Engine + Compose v2)


What you are starting

obleth exposes two HTTP surfaces:

SurfacePurposeAuth
Data plane (OpenAI-compatible proxy)Inference traffic — chat completions, streamingTenant API key (sk_...)
Management API (/api/v1)Create tenants, mint keys, adjust weights, read usageAdmin bearer token

In production, clients hit HAProxy on port 80 (TLS termination + load balancing across obleth pods). HAProxy forwards to obleth's data-plane listener. For local dev you can use HAProxy or talk to obleth directly.

Client  →  HAProxy :80  →  obleth :8080  →  upstream (mock vLLM / Aibrix / vLLM)
Admin   →  obleth Management API :9090

1. Start the stack

From the repo root:

docker compose -f deploy/docker/docker-compose.yml up --build -d

Wait until services are healthy (~30–60s on first build):

docker compose -f deploy/docker/docker-compose.yml ps

Host ports (Docker Compose)

ServiceURLNotes
HAProxy (recommended client entry)http://localhostPort 80 → obleth data plane
obleth data plane (direct)http://localhost:8088Bypasses HAProxy; maps container :8080
Management APIhttp://localhost:9090Admin operations
Prometheus metricshttp://localhost:9091/metricsScraped by Prometheus
Control plane dashboardhttp://localhost:3002Next.js UI
Mock vLLM backendhttp://localhost:8081Upstream only; don't call directly in normal use
Grafanahttp://localhost:3001Anonymous admin enabled in dev
Prometheus UIhttp://localhost:9095

Verify the data plane:

curl -s http://localhost/health
# ok

curl -s http://localhost:9090/api/v1/health
# ok

2. Create a tenant and API key

The Management API uses a separate admin token (not a tenant key). The dev stack sets:

OBLETH_ADMIN_TOKEN=dev-admin-token

PowerShell (Windows)

$TOKEN = "dev-admin-token"

# Create a tenant with elevated fairshare weight
$tenant = Invoke-RestMethod -Method POST `
  -Uri "http://localhost:9090/api/v1/tenants" `
  -Headers @{ Authorization = "Bearer $TOKEN" } `
  -ContentType "application/json" `
  -Body '{"name":"chatbot","weight":500,"tokens_per_minute":2000000}'

$TID = $tenant.id

# Mint an API key (secret shown once)
$key = Invoke-RestMethod -Method POST `
  -Uri "http://localhost:9090/api/v1/tenants/$TID/keys" `
  -Headers @{ Authorization = "Bearer $TOKEN" } `
  -ContentType "application/json" `
  -Body '{"name":"prod"}'

$SECRET = $key.secret
Write-Host "API key: $SECRET"

Bash / macOS / Linux

TOKEN=dev-admin-token

TID=$(curl -s -X POST http://localhost:9090/api/v1/tenants \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name":"chatbot","weight":500,"tokens_per_minute":2000000}' \
  | jq -r .id)

SECRET=$(curl -s -X POST "http://localhost:9090/api/v1/tenants/$TID/keys" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name":"prod"}' \
  | jq -r .secret)

echo "API key: $SECRET"

The secret looks like sk_<48 hex chars>. It is returned once; only a hash is stored in Postgres/Redis.

OpenAPI spec: http://localhost:9090/api/v1/openapi.json


3. Call the OpenAI-compatible API

obleth proxies OpenAI-style paths to the upstream (mock backend in this stack). Authenticate with your tenant API key, not the admin token.

Supported auth headers (either works):

  • Authorization: Bearer <sk_...>
  • x-api-key: <sk_...>

Non-streaming completion

Via HAProxy (production-like):

curl -s http://localhost/v1/chat/completions \
  -H "Authorization: Bearer $SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mock-model",
    "messages": [{"role": "user", "content": "Hello from obleth"}],
    "max_tokens": 32
  }'

Or direct to obleth (skip HAProxy):

curl -s http://localhost:8088/v1/chat/completions \
  -H "Authorization: Bearer $SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mock-model",
    "messages": [{"role": "user", "content": "Hello from obleth"}],
    "max_tokens": 32
  }'

Streaming (SSE)

curl -N http://localhost/v1/chat/completions \
  -H "Authorization: Bearer $SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mock-model",
    "stream": true,
    "messages": [{"role": "user", "content": "Stream a short reply"}],
    "max_tokens": 16
  }'

Using the OpenAI Python SDK

Point the client at obleth instead of api.openai.com:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost/v1",          # HAProxy → obleth
    api_key="sk_...",                          # your tenant key
)

response = client.chat.completions.create(
    model="mock-model",
    messages=[{"role": "user", "content": "hi"}],
    max_tokens=32,
)
print(response.choices[0].message.content)

For direct obleth access use base_url="http://localhost:8088/v1".


4. Open the dashboard

The control plane reads the Management API only (no direct DB access):

http://localhost:3002

Use the same admin token (dev-admin-token) if the UI prompts for it, or manage tenants via the API.


5. (Optional) Prove fairshare under load

Seed two tenants with very different weights and run the benchmark harness:

PROXY_BASE=http://localhost node bench/run-benchmark.mjs

Use PROXY_BASE=http://localhost:8088 if bypassing HAProxy. See Benchmark Harness.


Endpoint cheat sheet

WhatMethodURLAuth
Health (data plane)GEThttp://localhost/healthnone
Chat completionsPOSThttp://localhost/v1/chat/completionstenant key
List tenantsGEThttp://localhost:9090/api/v1/tenantsadmin token
Create tenantPOSThttp://localhost:9090/api/v1/tenantsadmin token
Mint keyPOSThttp://localhost:9090/api/v1/tenants/{id}/keysadmin token
Boost weightPATCHhttp://localhost:9090/api/v1/tenants/{id}/weightadmin token
Usage aggregatesGEThttp://localhost:9090/api/v1/usageadmin token
OpenAPIGEThttp://localhost:9090/api/v1/openapi.jsonnone

Common issues

401 missing bearer token / invalid api key
You are using the admin token on the data plane, or the key was never synced. Create a key via the Management API and use the sk_... secret on /v1/chat/completions.

Connection refused on :8080
Compose maps the data plane to host port 8088, not 8080. Use http://localhost (HAProxy) or http://localhost:8088 (direct).

Dashboard on :3000 does not load
The dashboard is on :3002 in this compose file.

Upstream errors
Check mock-backend: curl -s http://localhost:8081/health. Ensure OBLETH_UPSTREAM_BASE_URL points at it inside the compose network.


Stop the stack

docker compose -f deploy/docker/docker-compose.yml down

Add -v to remove Postgres/Redis/ClickHouse volumes.


Next steps