Aibrix Integration

How to connect obleth to an Aibrix inference gateway, configure per-model routing, and set up admission weights.

Aibrix is the recommended downstream inference router for obleth. It handles replica selection, KV-cache affinity, and prefix-cache routing — the layer obleth deliberately doesn't duplicate.

Basic connection

Point OBLETH_UPSTREAM_BASE_URL at your Aibrix gateway:

OBLETH_UPSTREAM_BASE_URL=http://aibrix-gateway.aibrix.svc.cluster.local:8080

With this set, all requests that don't match a per-model api_base override are forwarded to Aibrix. obleth preserves the model field, and Aibrix routes it to the appropriate vLLM replica.

Per-model routing

Register each model in obleth's model registry with its api_base:

curl -X POST http://localhost:9090/api/v1/models \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "llama-3-70b",
    "upstream_model": "meta-llama/Llama-3-70b-instruct",
    "api_base": "http://aibrix-gateway:8080",
    "input_cost_per_token": 0.0000005,
    "output_cost_per_token": 0.0000015,
    "context_window": 131072,
    "admission_weight": 4,
    "supports_function_calling": true,
    "supports_system_messages": true,
    "enabled": true
  }'

upstream_model is the model identifier sent to Aibrix (and then to vLLM). This lets you rename models for clients without changing your backend configuration.

Admission weights for model cost

Not all models cost the same. A 70B model consumes roughly 4–10× more GPU time than a 7B model. The admission_weight multiplier adjusts the effective fairshare cost:

effective_weight = tenant.weight × model.admission_weight

A tenant with weight=100 using a model with admission_weight=4 competes as if they had weight=400.

Modeladmission_weight
llama-3-8b1
llama-3-70b4
mixtral-8x7b2

Model API key injection

If your Aibrix gateway requires its own API key, set it in the model record. obleth strips the client's Authorization and injects the model's key upstream:

curl -X PUT http://localhost:9090/api/v1/models/$MODEL_ID \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{...existing fields..., "api_key": "aibrix-internal-key-xyz"}'

Without Aibrix

obleth works with any OpenAI-compatible upstream. Point OBLETH_UPSTREAM_BASE_URL at a raw vLLM service, LiteLLM, or the bundled mock backend. Aibrix is recommended for multi-replica GPU clusters, not a hard dependency.