56 docs indexed
Gateway-granted capabilities for models that lack them natively — vision (image-to-text relay), tools (function-calling emulation), and structured output (JSON-schema enforcement) — so a basic text model can still serve advanced requests.
A boon is a capability obleth grants a model at the gateway, on top of what the model can do on its own. Instead of every model needing native support for every modality — or every caller wiring up extra plumbing — obleth detects when a request needs a capability the target model lacks, fulfils it at the gateway, and rewrites the request (and, where needed, the response) so the original model can answer.
There are three boons today, all built on the same engine:
| Boon | boons value | Grants | Rewrites |
|---|---|---|---|
| Vision | vision | Image input on a text-only model, by relaying images to a describer model | Request |
| Tools | tools | OpenAI function calling on a model without native support | Request + response |
| Structured output | structured_output | response_format JSON-schema adherence | Request + response |
A few rules apply to every boon:
app_settings and hot-reloadable) and
the target model has the boon in its per-model boons list. Nothing is granted
by default.supports_vision, tools skips supports_function_calling,
structured output skips supports_response_schema. Native capability always
wins; the boon is a fallback.stream: true — re-emits the result as synthesized SSE. Streaming
requests are therefore buffered while these boons are active; consider
raising your client's request timeout.x-obleth-boons header listing
the boons that acted on the request. A non-fatal issue (for example,
structured-output validation that could not be repaired) is reported in
x-obleth-boons-warning while the original completion still passes through.x-obleth-boons: off to bypass all
boon processing for that single request.Boons that call a helper model (the vision describer, the structured-output fixer) meter that call against the calling tenant as its own usage record, so the extra cost is attributed and visible in the request log.
When a text-only model receives a chat request that contains an image, the vision boon:
image_url content part(s) in the request.glm-4-5v.[Image description: …].The target model never sees the image bytes; it sees a faithful text description in their place and answers as if it could see.
client ──▶ obleth (chat request with image_url)
│ target lacks vision + boon enabled
├──▶ describer (glm-4-5v) "describe this image"
│◀── "A 3D voxel render of…"
│ image part → "[Image description: …]"
├──▶ target model (text-only) (rewritten, text-only request)
│◀── answer
client ◀─────┘ answer
The boon runs for a request only when all of the following hold:
boons list includes
vision). Boons are off by default and granted per model — obleth never
applies a boon to a model that hasn't asked for it.supports_vision (models that can see
images are left untouched — their images pass straight through).image_url content part.If any condition is false, the request is forwarded unchanged.
The vision boon never blocks or fails a request. If the describer is unreachable, returns an error, times out, or returns an empty description, the affected image is left unchanged and the request is forwarded as-is. A flaky describer must not take down traffic the target model might still handle.
Each image is described independently, so one failed image does not discard the descriptions already produced for the others in the same request.
Every describe call is metered against the calling tenant and written to the usage ledger as its own record:
model is the describer model name (so its cost lands on the describer's
line).admission is boon and request_type is vision_boon, so boon traffic is
easy to isolate in the request log and cost breakdown.input_cost_per_token /
output_cost_per_token using the token usage the describer reports.The original request is billed normally on top, against the target model.
You need two models registered:
vision tag (or
set supports_vision: true via the API).supports_vision: false, the default) that you opt into the vision boon.Register a describer:
curl -X POST http://localhost:9180/api/v1/models \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model_name": "glm-4-5v",
"model_type": "chat",
"upstream_model": "glm-4-5v",
"api_base": "https://provider.example/v1",
"api_key": "sk_upstream",
"tags": ["vision"],
"supports_vision": true,
"enabled": true
}'
supports_vision is a capability flag on every chat model. It defaults to
false, so existing routes need no changes. In the dashboard it is derived
from the vision routing tag (Models → Routing tags → vision) —
ticking vision marks the model as natively image-capable and eligible to serve
as a system-wide describer. The Management API still accepts supports_vision
directly.
Vision boon settings live in the app_settings store (key boons) and are
hot-reloadable — the proxy picks up changes within its refresh interval, no
restart required.
From the control plane, open Settings → Model boons:
6).30000).Or via the Management API:
curl -X PUT http://localhost:9180/api/v1/settings/boons \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"vision_enabled": true,
"vision_fallback_model": "glm-4-5v",
"vision_describe_prompt": "Describe this image in thorough, faithful detail: all visible text (verbatim), UI elements, code, diagrams, charts, and layout.",
"vision_max_images": 6,
"vision_timeout_ms": 30000
}'
Send vision_fallback_model as "" to clear the describer (which deactivates
the boon, since no describer is set). See the
Management API for the full settings shape.
The boon is disabled by default. If you configure a describer but leave Enable vision boon off, images pass straight through to the target model — which, if it is text-only, will typically reject them. Make sure the master switch is on.
The global switch turns the vision boon on; each model then opts in
individually. A model only receives a boon when its boons list contains that
boon's name — nothing is granted by default, so you choose exactly which
text-only models should fall back to the describer.
From the dashboard, open a model's config and tick the boon under the Boons
group (Models → a row → Boons → vision). Via the Management API, set
the boons array on create or update:
curl -X PUT http://localhost:9180/api/v1/models/$MODEL_ID \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{ "boons": ["vision"] }'
boons is a fixed-vocabulary list (vision, tools, structured_output)
stored per model. Leave it empty (the default) to keep a model boon-free —
images sent to a text-only model with no boon are forwarded unchanged.
Send a normal OpenAI-style chat request with an image to a text-only model. No client changes are required — the boon is transparent.
curl http://localhost/v1/chat/completions \
-H "Authorization: Bearer sk_..." \
-H "Content-Type: application/json" \
-d '{
"model": "minimax-m2-7-fast",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What is this?"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgo..."}}
]
}]
}'
obleth relays the image to the describer, rewrites the request, and the
text-only minimax-m2-7-fast answers. The request log shows two entries: a
vision_boon call against glm-4-5v, and the chat call against
minimax-m2-7-fast.
After a request, confirm the boon ran by looking for the vision_boon record in
the usage ledger:
SELECT toDateTime(ts_ms / 1000) AS t, model, request_type, status_code
FROM obleth.usage
WHERE request_type = 'vision_boon'
ORDER BY ts_ms DESC
LIMIT 5;
If you see the target model return errors but no vision_boon record, the
boon did not fire — re-check the conditions under
When it applies: the boon is enabled, a describer is set, the
target model has opted in (boons includes vision), and it is flagged
supports_vision: false.
The tools boon emulates OpenAI function calling for a model that has no
native support for it. Tool definitions are rendered into the prompt as
instructions, and the model's free-text reply is parsed back into a proper
tool_calls response — so an agent or SDK that expects function calling works
against a plain chat model.
client ──▶ obleth (chat request with a `tools` array)
│ target lacks function calling + boon enabled
│ render tool defs into the prompt, strip the `tools` field
├──▶ target model (rewritten, plain-chat request)
│◀── "...```tool_call {"name":"search","arguments":{...}}``` ..."
│ parse tool_call block(s) → message.tool_calls
client ◀─────┘ {tool_calls:[…], finish_reason:"tool_calls"}
tools and is not flagged
supports_function_calling (models with native function calling are left
untouched).tools array.If tool_choice is "none", the tool definitions are simply stripped (the model
is told not to call anything) and no response parsing is armed.
```tool_call blocks containing a single JSON
object with name and arguments. Prior tool_calls / tool messages in
the conversation are flattened into the same textual format (as tool_result
blocks) so multi-turn agent loops round-trip. The tools, tool_choice,
functions, function_call, and parallel_tool_calls fields are then
removed so the upstream never sees parameters it cannot parse. At most
max_tools definitions are rendered (default 32).```tool_call blocks and
rewrites them into a standard message.tool_calls array with
finish_reason: "tool_calls". A call to a tool name that was not defined, or
a block that does not parse, is left in the text as-is (fail-open per block).The tools boon adds no extra upstream calls — it is pure prompt engineering plus response parsing — so there is no separate usage record; the work is billed as the target model's normal request.
From the control plane, open Settings → Model boons and turn on Enable tools boon, optionally adjusting Max tools. Or via the Management API:
curl -X PUT http://localhost:9180/api/v1/settings/boons \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{ "tools_enabled": true, "tools_max_tools": 32 }'
Then grant tools to each model that should fall back to emulation (its config's
Boons group, or "boons": ["tools"] via the API).
Because the response is buffered and re-emitted, a streaming client sees the answer arrive in one burst rather than token-by-token while the tools boon is active. Raise your request timeout if your tool-calling turns are long.
A common use of the tools boon is giving a basic model live web search. obleth does not execute tool calls itself — that stays with your client or agent loop — but the tools boon lets a model without native function calling still emit a search call, and obleth's MCP gateway gives the client one authenticated endpoint to run that search through.
The examples/searxng/ compose file runs a private
SearXNG metasearch instance fronted by an MCP server,
both joined to obleth's docker network:
docker compose -f examples/searxng/docker-compose.yml up -d
Register the MCP server in obleth (MCP Servers → Register, or the API) with
upstream URL http://mcp-searxng:8765/mcp, and clients reach it at
/mcp/searxng using their obleth key. The end-to-end loop is:
search tool defined. The tools boon
renders it into the prompt, and the text-only model replies with a
tool_call that obleth parses into a real tool_calls response.POST /mcp/searxng through the gateway,
then sends the result back as a tool message.So the tools boon and the MCP gateway compose: the boon supplies function-calling syntax to models that lack it, while the MCP gateway supplies authenticated, audited access to the search backend. Tool execution remains on the client side.
The structured_output boon enforces response_format JSON schemas at the
gateway for a model without native support. The schema is rendered into the
prompt, the reply is validated at the gateway, and invalid JSON is repaired by a
configurable fixer model — so callers reliably get schema-conforming JSON even
from a model that would otherwise return prose-wrapped or malformed output.
structured_output and is not flagged
supports_response_schema.response_format.type is
json_schema or json_object.response_format field and injects a system
section instructing the model to reply with a single JSON document — the
provided JSON Schema for json_schema, or a generic "valid JSON object"
instruction for json_object. Schemas larger than 64 KB are rendered into the
prompt but not validated (a guard against pathological documents).max_repair_attempts times. Each
repair call is billed to the tenant as a structured_output_boon record.x-obleth-boons-warning: structured_output_validation_failed.From Settings → Model boons, turn on Enable structured output boon, choose a Fixer model, and set the repair attempts and timeout. Or via the Management API:
curl -X PUT http://localhost:9180/api/v1/settings/boons \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"structured_output_enabled": true,
"structured_output_fixer_model": "qwen3-235b",
"structured_output_max_repair_attempts": 1,
"structured_output_timeout_ms": 30000
}'
structured_output_max_repair_attempts is clamped to a maximum of 3. Send
structured_output_fixer_model as "" to repair with the request's own model
instead of a dedicated fixer. Then grant structured_output to each model that
should be enforced.
The vision boon is distinct from registering a natively vision-capable chat
model. If a model can see images itself, give it the vision tag (which sets
supports_vision: true) and the boon leaves its requests alone. The boon exists
specifically to extend text-only models that opt in. For serving images,
audio, and embeddings as first-class modalities, see
Multi-modal Models.