56 docs indexed

Model Boons

Gateway-granted capabilities for models that lack them natively — vision (image-to-text relay), tools (function-calling emulation), and structured output (JSON-schema enforcement) — so a basic text model can still serve advanced requests.

A boon is a capability obleth grants a model at the gateway, on top of what the model can do on its own. Instead of every model needing native support for every modality — or every caller wiring up extra plumbing — obleth detects when a request needs a capability the target model lacks, fulfils it at the gateway, and rewrites the request (and, where needed, the response) so the original model can answer.

There are three boons today, all built on the same engine:

Boonboons valueGrantsRewrites
VisionvisionImage input on a text-only model, by relaying images to a describer modelRequest
ToolstoolsOpenAI function calling on a model without native supportRequest + response
Structured outputstructured_outputresponse_format JSON-schema adherenceRequest + response

How boons work

A few rules apply to every boon:

  • Opt-in, twice. A boon never fires unless it is enabled globally (Settings → Model boons, stored in app_settings and hot-reloadable) and the target model has the boon in its per-model boons list. Nothing is granted by default.
  • Only when the model lacks it natively. Each boon checks the matching capability flag and steps aside if the model already has it — vision skips models flagged supports_vision, tools skips supports_function_calling, structured output skips supports_response_schema. Native capability always wins; the boon is a fallback.
  • Fail-open, always. Any error — no helper model configured, an upstream failure, a timeout, an unparseable reply — leaves the request or response unchanged. A flaky helper must never block traffic the target model might still handle on its own.
  • Applied in order. For a single chat request obleth applies vision, then tools, then structured output. The tools and structured-output boons also rewrite the response: when either is active, obleth forces a non-streaming upstream call, buffers the completion, transforms it, and — for clients that asked for stream: true — re-emits the result as synthesized SSE. Streaming requests are therefore buffered while these boons are active; consider raising your client's request timeout.
  • Observable per request. Responses carry an x-obleth-boons header listing the boons that acted on the request. A non-fatal issue (for example, structured-output validation that could not be repaired) is reported in x-obleth-boons-warning while the original completion still passes through.
  • Escape hatch. Send the request header x-obleth-boons: off to bypass all boon processing for that single request.

Boons that call a helper model (the vision describer, the structured-output fixer) meter that call against the calling tenant as its own usage record, so the extra cost is attributed and visible in the request log.

The vision boon

When a text-only model receives a chat request that contains an image, the vision boon:

  1. Detects the image_url content part(s) in the request.
  2. Relays each image to a configured describer model — a vision-capable model such as glm-4-5v.
  3. Replaces the image part inline with the returned text, as [Image description: …].
  4. Forwards the rewritten, now text-only request to the originally requested model.

The target model never sees the image bytes; it sees a faithful text description in their place and answers as if it could see.

client ──▶ obleth                                  (chat request with image_url)
             │  target lacks vision + boon enabled
             ├──▶ describer (glm-4-5v)             "describe this image"
             │◀── "A 3D voxel render of…"
             │  image part → "[Image description: …]"
             ├──▶ target model (text-only)         (rewritten, text-only request)
             │◀── answer
client ◀─────┘                                     answer

When it applies

The boon runs for a request only when all of the following hold:

  • The vision boon is enabled globally.
  • A describer model is configured.
  • The target model has opted into the vision boon (its boons list includes vision). Boons are off by default and granted per model — obleth never applies a boon to a model that hasn't asked for it.
  • The target model is not flagged supports_vision (models that can see images are left untouched — their images pass straight through).
  • The request actually contains an image_url content part.

If any condition is false, the request is forwarded unchanged.

Fail-open by design

The vision boon never blocks or fails a request. If the describer is unreachable, returns an error, times out, or returns an empty description, the affected image is left unchanged and the request is forwarded as-is. A flaky describer must not take down traffic the target model might still handle.

Each image is described independently, so one failed image does not discard the descriptions already produced for the others in the same request.

Billing and attribution

Every describe call is metered against the calling tenant and written to the usage ledger as its own record:

  • model is the describer model name (so its cost lands on the describer's line).
  • admission is boon and request_type is vision_boon, so boon traffic is easy to isolate in the request log and cost breakdown.
  • Cost is computed from the describer's input_cost_per_token / output_cost_per_token using the token usage the describer reports.

The original request is billed normally on top, against the target model.

Prerequisites

You need two models registered:

  1. A describer — a vision-capable chat model. Give it the vision tag (or set supports_vision: true via the API).
  2. One or more target models that lack native vision (supports_vision: false, the default) that you opt into the vision boon.

Register a describer:

curl -X POST http://localhost:9180/api/v1/models \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "glm-4-5v",
    "model_type": "chat",
    "upstream_model": "glm-4-5v",
    "api_base": "https://provider.example/v1",
    "api_key": "sk_upstream",
    "tags": ["vision"],
    "supports_vision": true,
    "enabled": true
  }'

supports_vision is a capability flag on every chat model. It defaults to false, so existing routes need no changes. In the dashboard it is derived from the vision routing tag (ModelsRouting tagsvision) — ticking vision marks the model as natively image-capable and eligible to serve as a system-wide describer. The Management API still accepts supports_vision directly.

Enabling the boon

Vision boon settings live in the app_settings store (key boons) and are hot-reloadable — the proxy picks up changes within its refresh interval, no restart required.

From the control plane, open Settings → Model boons:

  • Enable vision boon — the master switch.
  • Describer model — a dropdown of your vision-capable models.
  • Describe prompt — the instruction sent to the describer.
  • Max images per request — cap on how many images are described per request (default 6).
  • Describe timeout (ms) — per-image timeout (default 30000).

Or via the Management API:

curl -X PUT http://localhost:9180/api/v1/settings/boons \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "vision_enabled": true,
    "vision_fallback_model": "glm-4-5v",
    "vision_describe_prompt": "Describe this image in thorough, faithful detail: all visible text (verbatim), UI elements, code, diagrams, charts, and layout.",
    "vision_max_images": 6,
    "vision_timeout_ms": 30000
  }'

Send vision_fallback_model as "" to clear the describer (which deactivates the boon, since no describer is set). See the Management API for the full settings shape.

The boon is disabled by default. If you configure a describer but leave Enable vision boon off, images pass straight through to the target model — which, if it is text-only, will typically reject them. Make sure the master switch is on.

Granting the boon to a model

The global switch turns the vision boon on; each model then opts in individually. A model only receives a boon when its boons list contains that boon's name — nothing is granted by default, so you choose exactly which text-only models should fall back to the describer.

From the dashboard, open a model's config and tick the boon under the Boons group (Models → a row → Boonsvision). Via the Management API, set the boons array on create or update:

curl -X PUT http://localhost:9180/api/v1/models/$MODEL_ID \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "boons": ["vision"] }'

boons is a fixed-vocabulary list (vision, tools, structured_output) stored per model. Leave it empty (the default) to keep a model boon-free — images sent to a text-only model with no boon are forwarded unchanged.

Using it from a client

Send a normal OpenAI-style chat request with an image to a text-only model. No client changes are required — the boon is transparent.

curl http://localhost/v1/chat/completions \
  -H "Authorization: Bearer sk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-m2-7-fast",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is this?"},
        {"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgo..."}}
      ]
    }]
  }'

obleth relays the image to the describer, rewrites the request, and the text-only minimax-m2-7-fast answers. The request log shows two entries: a vision_boon call against glm-4-5v, and the chat call against minimax-m2-7-fast.

Verifying

After a request, confirm the boon ran by looking for the vision_boon record in the usage ledger:

SELECT toDateTime(ts_ms / 1000) AS t, model, request_type, status_code
FROM obleth.usage
WHERE request_type = 'vision_boon'
ORDER BY ts_ms DESC
LIMIT 5;

If you see the target model return errors but no vision_boon record, the boon did not fire — re-check the conditions under When it applies: the boon is enabled, a describer is set, the target model has opted in (boons includes vision), and it is flagged supports_vision: false.

The tools boon

The tools boon emulates OpenAI function calling for a model that has no native support for it. Tool definitions are rendered into the prompt as instructions, and the model's free-text reply is parsed back into a proper tool_calls response — so an agent or SDK that expects function calling works against a plain chat model.

client ──▶ obleth                          (chat request with a `tools` array)
             │  target lacks function calling + boon enabled
             │  render tool defs into the prompt, strip the `tools` field
             ├──▶ target model              (rewritten, plain-chat request)
             │◀── "...```tool_call {"name":"search","arguments":{...}}``` ..."
             │  parse tool_call block(s) → message.tool_calls
client ◀─────┘                             {tool_calls:[…], finish_reason:"tool_calls"}

When it applies

  • The tools boon is enabled globally.
  • The target model has opted into tools and is not flagged supports_function_calling (models with native function calling are left untouched).
  • The request is a chat completion carrying a non-empty tools array.

If tool_choice is "none", the tool definitions are simply stripped (the model is told not to call anything) and no response parsing is armed.

How it works

  1. Request. obleth renders each function tool — name, description, and parameters schema — into an injected system section that instructs the model to emit calls as fenced ```tool_call blocks containing a single JSON object with name and arguments. Prior tool_calls / tool messages in the conversation are flattened into the same textual format (as tool_result blocks) so multi-turn agent loops round-trip. The tools, tool_choice, functions, function_call, and parallel_tool_calls fields are then removed so the upstream never sees parameters it cannot parse. At most max_tools definitions are rendered (default 32).
  2. Response. obleth scans the completion for ```tool_call blocks and rewrites them into a standard message.tool_calls array with finish_reason: "tool_calls". A call to a tool name that was not defined, or a block that does not parse, is left in the text as-is (fail-open per block).

The tools boon adds no extra upstream calls — it is pure prompt engineering plus response parsing — so there is no separate usage record; the work is billed as the target model's normal request.

Enabling it

From the control plane, open Settings → Model boons and turn on Enable tools boon, optionally adjusting Max tools. Or via the Management API:

curl -X PUT http://localhost:9180/api/v1/settings/boons \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "tools_enabled": true, "tools_max_tools": 32 }'

Then grant tools to each model that should fall back to emulation (its config's Boons group, or "boons": ["tools"] via the API).

Because the response is buffered and re-emitted, a streaming client sees the answer arrive in one burst rather than token-by-token while the tools boon is active. Raise your request timeout if your tool-calling turns are long.

Web search with the tools boon

A common use of the tools boon is giving a basic model live web search. obleth does not execute tool calls itself — that stays with your client or agent loop — but the tools boon lets a model without native function calling still emit a search call, and obleth's MCP gateway gives the client one authenticated endpoint to run that search through.

The examples/searxng/ compose file runs a private SearXNG metasearch instance fronted by an MCP server, both joined to obleth's docker network:

docker compose -f examples/searxng/docker-compose.yml up -d

Register the MCP server in obleth (MCP Servers → Register, or the API) with upstream URL http://mcp-searxng:8765/mcp, and clients reach it at /mcp/searxng using their obleth key. The end-to-end loop is:

  1. The client sends a chat request with a search tool defined. The tools boon renders it into the prompt, and the text-only model replies with a tool_call that obleth parses into a real tool_calls response.
  2. The client runs the search by calling POST /mcp/searxng through the gateway, then sends the result back as a tool message.
  3. The tools boon flattens that result into the next prompt, and the model answers using the fetched information.

So the tools boon and the MCP gateway compose: the boon supplies function-calling syntax to models that lack it, while the MCP gateway supplies authenticated, audited access to the search backend. Tool execution remains on the client side.

The structured output boon

The structured_output boon enforces response_format JSON schemas at the gateway for a model without native support. The schema is rendered into the prompt, the reply is validated at the gateway, and invalid JSON is repaired by a configurable fixer model — so callers reliably get schema-conforming JSON even from a model that would otherwise return prose-wrapped or malformed output.

When it applies

  • The structured-output boon is enabled globally.
  • The target model has opted into structured_output and is not flagged supports_response_schema.
  • The request is a chat completion whose response_format.type is json_schema or json_object.

How it works

  1. Request. obleth removes the response_format field and injects a system section instructing the model to reply with a single JSON document — the provided JSON Schema for json_schema, or a generic "valid JSON object" instruction for json_object. Schemas larger than 64 KB are rendered into the prompt but not validated (a guard against pathological documents).
  2. Response. obleth extracts the JSON document from the reply (tolerating markdown fences and stray prose) and validates it against the schema. If it passes, the canonical JSON replaces the message content.
  3. Repair. If validation fails, obleth re-prompts a fixer model (the configured one, or the request's own model when none is set) with the invalid output and the validation errors, up to max_repair_attempts times. Each repair call is billed to the tenant as a structured_output_boon record.
  4. Fail-open. If every attempt still fails, the original completion passes through unchanged and the response carries x-obleth-boons-warning: structured_output_validation_failed.

Enabling it

From Settings → Model boons, turn on Enable structured output boon, choose a Fixer model, and set the repair attempts and timeout. Or via the Management API:

curl -X PUT http://localhost:9180/api/v1/settings/boons \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "structured_output_enabled": true,
    "structured_output_fixer_model": "qwen3-235b",
    "structured_output_max_repair_attempts": 1,
    "structured_output_timeout_ms": 30000
  }'

structured_output_max_repair_attempts is clamped to a maximum of 3. Send structured_output_fixer_model as "" to repair with the request's own model instead of a dedicated fixer. Then grant structured_output to each model that should be enforced.

Relationship to multi-modal routes

The vision boon is distinct from registering a natively vision-capable chat model. If a model can see images itself, give it the vision tag (which sets supports_vision: true) and the boon leaves its requests alone. The boon exists specifically to extend text-only models that opt in. For serving images, audio, and embeddings as first-class modalities, see Multi-modal Models.