OpenAI-compatible API

All data-plane routes obleth proxies, authentication headers, streaming, model routing, and how to configure popular SDKs.

obleth's data plane is a transparent OpenAI-compatible proxy. Any client that speaks the OpenAI HTTP API works with obleth with only a base_url change.

Authentication

All data-plane requests must include a tenant API key. Either header is accepted:

Authorization: Bearer sk_<48 hex chars>
x-api-key: sk_<48 hex chars>

The admin token (OBLETH_ADMIN_TOKEN) is for the Management API only. Never send it to the data plane.

Supported routes

obleth proxies all standard OpenAI inference routes to the configured upstream:

Route	Method	Notes
`/v1/chat/completions`	POST	Streaming (`stream: true`) and non-streaming
`/v1/completions`	POST	Legacy completions
`/v1/embeddings`	POST	Embedding routes (`model_type: embedding`)
`/v1/audio/transcriptions`	POST	Speech-to-text, multipart upload (`model_type: audio_transcription`)
`/v1/audio/translations`	POST	Speech translation, multipart upload (`model_type: audio_transcription`)
`/v1/audio/speech`	POST	Text-to-speech (`model_type: audio_speech`)
`/v1/images/generations`	POST	Image generation, plus edits/variations (`model_type: image`)
`/v1/models`	GET	Proxied to upstream
`/health`	GET	Returns `ok` (no auth required)
`/mcp/{server}`	ANY	MCP gateway (see MCP Gateway)

Any other path is forwarded to the upstream unchanged — obleth is a fall-through proxy for paths it doesn't handle specially.

obleth serves more than chat. Embeddings, speech-to-text, text-to-speech, and image generation are all OpenAI-compatible and billed through the same ledger. Each non-chat route declares a model_type. See Multi-modal Models for the full guide.

Non-streaming request

curl -s http://localhost/v1/chat/completions \
  -H "Authorization: Bearer $SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3-70b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is obleth?"}
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'

Streaming (SSE)

curl -N http://localhost/v1/chat/completions \
  -H "Authorization: Bearer $SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3-70b",
    "stream": true,
    "messages": [{"role": "user", "content": "Count to 5"}],
    "max_tokens": 32
  }'

obleth streams the SSE response byte-for-byte from the upstream. The fairshare permit is held until the stream closes.

Models

obleth routes by the model field in the request body. For paths that require model resolution (/v1/chat/completions, /v1/completions), the model must be registered in obleth's model registry.

curl -X POST http://localhost:9180/api/v1/models \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "llama-3-70b",
    "description": "Meta Llama 3 70B instruct route through Aibrix",
    "upstream_model": "meta-llama/Llama-3-70b-instruct",
    "api_base": "http://my-aibrix:8080",
    "input_cost_per_token": 0.0000005,
    "output_cost_per_token": 0.0000015,
    "context_window": 131072,
    "enabled": true
  }'

For the bundled benchmark fixture backend, register a model route named benchmark-endpoint with api_base: "http://benchmark-backend:8081", or let bench/run-benchmark.mjs create/update that route for you.

api_base convention: set api_base to the provider base URL ending in /v1 (for example https://provider.example/v1), not a full endpoint URL. obleth preserves the client's request path and appends it to api_base, so a full endpoint like .../v1/embeddings would double the path. This applies to every model type.

Model admission weight

admission_weight (default 1) multiplies the tenant's weight for the fairshare score on this model. Set it higher for expensive models:

# 70B model costs 4x the admission weight of a 7B model
curl -X PUT http://localhost:9180/api/v1/models/$MODEL_ID \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{...existing fields..., "admission_weight": 4}'

SDK configuration

Python (openai package)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost/v1",   # or http://localhost:8088/v1 direct
    api_key="sk_...",
)

response = client.chat.completions.create(
    model="llama-3-70b",
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=64,
)
print(response.choices[0].message.content)

LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://localhost/v1",
    api_key="sk_...",
    model="llama-3-70b",
)

TypeScript / Node.js

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost/v1',
  apiKey: 'sk_...',
});

const response = await client.chat.completions.create({
  model: 'llama-3-70b',
  messages: [{ role: 'user', content: 'Hello' }],
  max_tokens: 64,
});

Request limits

Limit	Value
Request body size	64 MiB
MCP body size	16 MiB
Response cache max body	512 KiB (larger responses stream through uncached)

Session tracking (request logs)

To group related requests in the dashboard Request Logs view, send a session identifier on each call. obleth records it on the ClickHouse usage row and does not forward it to the upstream unless you include it in the body yourself.

Accepted sources (first match wins):

Source	Example
`x-obleth-session-id` header	`x-obleth-session-id: conv-abc123`
`x-session-id` header	`x-session-id: conv-abc123`
`session_id` in JSON body	`{"model": "...", "session_id": "conv-abc123", ...}`
`metadata.session_id` in JSON body	`{"metadata": {"session_id": "conv-abc123"}, ...}`
`user` in JSON body	`{"user": "user-42", ...}` (OpenAI convention)

Values are trimmed and capped at 200 characters. See Control Plane — Request Logs.

Headers forwarded to upstream

obleth forwards all non-hop-by-hop headers from the client to the upstream, minus the client's Authorization header. If the model has an api_key configured, obleth injects that as the upstream's Authorization: Bearer <model_api_key> instead.

PreviousDatastores

NextMulti-modal Models

Getting Started

Concepts

Guides

Reference

Operations