Aibrix Integration

How to connect obleth to an Aibrix inference gateway, configure per-model routing, and set up admission weights.

Aibrix is the recommended downstream inference router for obleth. It handles replica selection, KV-cache affinity, and prefix-cache routing — the layer obleth deliberately doesn't duplicate.

Basic connection

Point OBLETH_UPSTREAM_BASE_URL at your Aibrix gateway (base URL ending in /v1):

OBLETH_UPSTREAM_BASE_URL=http://aibrix-gateway.aibrix.svc.cluster.local:8080/v1

With this set, all requests that don't match a per-model api_base override are forwarded to Aibrix. obleth preserves the model field, and Aibrix routes it to the appropriate vLLM replica.

Per-model routing

curl -X POST http://localhost:9180/api/v1/models \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "llama-3-70b",
    "description": "Production 70B chat model routed through Aibrix",
    "upstream_model": "meta-llama/Llama-3-70b-instruct",
    "api_base": "http://aibrix-gateway.aibrix.svc.cluster.local:8080/v1",
    "input_cost_per_token": 0.0000005,
    "output_cost_per_token": 0.0000015,
    "context_window": 131072,
    "admission_weight": 4,
    "supports_function_calling": true,
    "supports_system_messages": true,
    "enabled": true
  }'

upstream_model is the bare model identifier sent to Aibrix (and then to vLLM) — not a URL. api_base must end in /v1 (the provider base, not a full endpoint path). This lets you rename models for clients without changing your backend configuration.

Admission weights for model cost

Not all models cost the same. A 70B model consumes roughly 4–10× more GPU time than a 7B model. The admission_weight multiplier adjusts the effective fairshare cost:

effective_weight = tenant.weight × model.admission_weight

A tenant with weight=100 using a model with admission_weight=4 competes as if they had weight=400.

Model	admission_weight
`llama-3-8b`	1
`llama-3-70b`	4
`mixtral-8x7b`	2

Model API key injection

If your Aibrix gateway requires its own API key, set it in the model record. obleth strips the client's Authorization and injects the model's key upstream:

curl -X PUT http://localhost:9180/api/v1/models/$MODEL_ID \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{...existing fields..., "api_key": "aibrix-internal-key-xyz"}'

Without Aibrix

obleth works with any OpenAI-compatible upstream. Point OBLETH_UPSTREAM_BASE_URL at a raw vLLM service, LiteLLM, or the bundled benchmark fixture backend. Aibrix is recommended for multi-replica GPU clusters, not a hard dependency.

PreviousChaos Testing

NextProduction Checklist

Getting Started

Concepts

Guides

Reference

Operations