Prometheus Metrics

All obleth Prometheus metrics with types, labels, and example queries.

Metrics are exposed at http://localhost:9091/metrics in the standard Prometheus text format.

Metric reference

Request counters

MetricTypeLabelsDescription
obleth_requests_totalCounteradmission, statusTotal requests by admission class and HTTP status

admission values: fast (admitted immediately), queued (waited in queue), brownout (degraded after timeout). status values: 2xx, 4xx, 5xx.

Token counters

MetricTypeLabelsDescription
obleth_input_tokens_totalCounterTotal input tokens processed
obleth_output_tokens_totalCounterTotal output tokens processed

Latency histograms

MetricTypeLabelsDescription
obleth_ttft_msHistogramTime to first token (ms)
obleth_total_msHistogramTotal request duration (ms)

obleth_ttft_ms buckets cover 5ms to 5,000ms. obleth_total_ms buckets cover 10ms to 30,000ms.

Concurrency gauges

MetricTypeLabelsDescription
obleth_in_flightGaugeCurrent concurrent in-flight requests
obleth_queue_depthGaugeRequests currently waiting in the admission queue

Reliability

MetricTypeLabelsDescription
obleth_telemetry_droppedGaugeTelemetry records dropped because the async channel was full

Cache

MetricTypeLabelsDescription
obleth_cache_lookups_totalCounterresultCache lookups by result (hit, miss)
obleth_cache_tokens_saved_totalCounterOutput tokens served from cache (not billed to upstream)

MCP

MetricTypeLabelsDescription
obleth_mcp_requests_totalCounterserver, statusMCP proxied requests by server name and status

Example PromQL queries

Request rate by admission class

rate(obleth_requests_total[5m])

Fraction of brownout requests

rate(obleth_requests_total{admission="brownout"}[5m])
  / rate(obleth_requests_total[5m])

P95 time to first token

histogram_quantile(0.95, rate(obleth_ttft_ms_bucket[5m]))

Cache hit rate

rate(obleth_cache_lookups_total{result="hit"}[5m])
  / rate(obleth_cache_lookups_total[5m])

Scrape configuration

# prometheus.yml
scrape_configs:
  - job_name: obleth
    static_configs:
      - targets: ['localhost:9091']

With Prometheus Operator (Helm serviceMonitor.enabled=true):

serviceMonitor:
  enabled: true
  interval: 15s
  path: /metrics
  port: metrics