OpenAI-compatible API

All data-plane routes obleth proxies, authentication headers, streaming, model routing, and how to configure popular SDKs.

obleth's data plane is a transparent OpenAI-compatible proxy. Any client that speaks the OpenAI HTTP API works with obleth with only a base_url change.

Authentication

All data-plane requests must include a tenant API key. Either header is accepted:

Authorization: Bearer sk_<48 hex chars>
x-api-key: sk_<48 hex chars>

The admin token (OBLETH_ADMIN_TOKEN) is for the Management API only. Never send it to the data plane.

Supported routes

obleth proxies all standard OpenAI inference routes to the configured upstream:

RouteMethodNotes
/v1/chat/completionsPOSTStreaming (stream: true) and non-streaming
/v1/completionsPOSTLegacy completions
/v1/embeddingsPOSTProxied, not fairshare-throttled separately
/v1/modelsGETProxied to upstream
/healthGETReturns ok (no auth required)
/mcp/{server}ANYMCP gateway (see MCP Gateway)

Any other path is forwarded to the upstream unchanged — obleth is a fall-through proxy for paths it doesn't handle specially.

Non-streaming request

curl -s http://localhost/v1/chat/completions \
  -H "Authorization: Bearer $SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3-70b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is obleth?"}
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'

Streaming (SSE)

curl -N http://localhost/v1/chat/completions \
  -H "Authorization: Bearer $SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3-70b",
    "stream": true,
    "messages": [{"role": "user", "content": "Count to 5"}],
    "max_tokens": 32
  }'

obleth streams the SSE response byte-for-byte from the upstream. The fairshare permit is held until the stream closes.

Models

obleth routes by the model field in the request body. For paths that require model resolution (/v1/chat/completions, /v1/completions), the model must be registered in obleth's model registry.

Register a model:

curl -X POST http://localhost:9090/api/v1/models \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "llama-3-70b",
    "upstream_model": "meta-llama/Llama-3-70b-instruct",
    "api_base": "http://my-aibrix:8080",
    "input_cost_per_token": 0.0000005,
    "output_cost_per_token": 0.0000015,
    "context_window": 131072,
    "enabled": true
  }'

For the mock backend dev stack, mock-model is automatically available without registration.

Model admission weight

admission_weight (default 1) multiplies the tenant's weight for the fairshare score on this model. Set it higher for expensive models:

# 70B model costs 4x the admission weight of a 7B model
curl -X PUT http://localhost:9090/api/v1/models/$MODEL_ID \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{...existing fields..., "admission_weight": 4}'

SDK configuration

Python (openai package)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost/v1",   # or http://localhost:8088/v1 direct
    api_key="sk_...",
)

response = client.chat.completions.create(
    model="llama-3-70b",
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=64,
)
print(response.choices[0].message.content)

LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://localhost/v1",
    api_key="sk_...",
    model="llama-3-70b",
)

TypeScript / Node.js

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost/v1',
  apiKey: 'sk_...',
});

const response = await client.chat.completions.create({
  model: 'llama-3-70b',
  messages: [{ role: 'user', content: 'Hello' }],
  max_tokens: 64,
});

Request limits

LimitValue
Request body size64 MiB
MCP body size16 MiB
Response cache max body512 KiB (larger responses stream through uncached)

Headers forwarded to upstream

obleth forwards all non-hop-by-hop headers from the client to the upstream, minus the client's Authorization header. If the model has an api_key configured, obleth injects that as the upstream's Authorization: Bearer <model_api_key> instead.