All obleth Prometheus metrics with types, labels, and example queries.
Metrics are exposed at http://localhost:9091/metrics in the standard Prometheus text format.
Metric reference
Request counters
| Metric | Type | Labels | Description |
|---|
obleth_requests_total | Counter | admission, status | Total requests by admission class and HTTP status |
admission values: fast (admitted immediately), queued (waited in queue), brownout (degraded after timeout).
status values: 2xx, 4xx, 5xx.
Token counters
| Metric | Type | Labels | Description |
|---|
obleth_input_tokens_total | Counter | — | Total input tokens processed |
obleth_output_tokens_total | Counter | — | Total output tokens processed |
Latency histograms
| Metric | Type | Labels | Description |
|---|
obleth_ttft_ms | Histogram | — | Time to first token (ms) |
obleth_total_ms | Histogram | — | Total request duration (ms) |
obleth_ttft_ms buckets cover 5ms to 5,000ms. obleth_total_ms buckets cover 10ms to 30,000ms.
Concurrency gauges
| Metric | Type | Labels | Description |
|---|
obleth_in_flight | Gauge | — | Current concurrent in-flight requests |
obleth_queue_depth | Gauge | — | Requests currently waiting in the admission queue |
Reliability
| Metric | Type | Labels | Description |
|---|
obleth_telemetry_dropped | Gauge | — | Telemetry records dropped because the async channel was full |
Cache
| Metric | Type | Labels | Description |
|---|
obleth_cache_lookups_total | Counter | result | Cache lookups by result (hit, miss) |
obleth_cache_tokens_saved_total | Counter | — | Output tokens served from cache (not billed to upstream) |
MCP
| Metric | Type | Labels | Description |
|---|
obleth_mcp_requests_total | Counter | server, status | MCP proxied requests by server name and status |
Example PromQL queries
Request rate by admission class
rate(obleth_requests_total[5m])
Fraction of brownout requests
rate(obleth_requests_total{admission="brownout"}[5m])
/ rate(obleth_requests_total[5m])
P95 time to first token
histogram_quantile(0.95, rate(obleth_ttft_ms_bucket[5m]))
Cache hit rate
rate(obleth_cache_lookups_total{result="hit"}[5m])
/ rate(obleth_cache_lookups_total[5m])
Scrape configuration
scrape_configs:
- job_name: obleth
static_configs:
- targets: ['localhost:9091']
With Prometheus Operator (Helm serviceMonitor.enabled=true):
serviceMonitor:
enabled: true
interval: 15s
path: /metrics
port: metrics