Gateway Proxy

AuditTrail ships an OpenAI-compatible gateway proxy at /api/v1/gateway/proxy/v1. Drop your existing OpenAI / Anthropic / Azure SDK base URL onto it and get full-fidelity traces with zero code changes. No re-instrumentation; no @traceable decorators.

Why it's useful

Zero-code integration — any OpenAI-compatible client (SDK or cURL) routes through the proxy and is automatically traced.
Virtual keys per user — issue scoped at-gw-… keys from /settings?tab=gateway. The real provider key stays on the server.
Rate / spend limits — bound per-user call rate and monthly spend without asking every SDK caller to implement quotas.
Routing — route by model id to OpenAI, Anthropic, and any OpenAI-compatible provider (Ollama, Gemini, Qwen, MiniMax, local vLLM) from one deployment, without touching client code.

Quickstart

bash

# 1. Configure a provider key on the server (once, via /settings?tab=gateway)
#    AuditTrail encrypts + stores under a tenant virtual key.
 
# 2. Point your OpenAI SDK at the proxy
export OPENAI_BASE_URL="https://auditrail.yourco.com/api/v1/gateway/proxy/v1"
export OPENAI_API_KEY="at-gw-YOUR-VIRTUAL-KEY"
 
# 3. Call as usual — traces flow into AuditTrail automatically
python -c "from openai import OpenAI; c = OpenAI(); \
  print(c.chat.completions.create(model='gpt-4o', messages=[{'role':'user','content':'hi'}]).choices[0].message.content)"

How it works under the hood

Client sends POST /api/v1/gateway/proxy/v1/chat/completions with an at-gw-* virtual key in the Authorization header.
Gateway resolves the virtual key → tenant → provider key (Fernet- encrypted at rest, same storage machinery as the Assistant).
Streams the request to the real provider endpoint.
Observes the stream, buffers span events, writes a Trace + child Span rows tagged with the tenant's user id.
Returns the provider's response bytes untouched.

Spans carry the standard OTel gen_ai.* attribute contract so the exact same trace shape lands regardless of whether you use the gateway, one of the first-party SDKs, or an external OTel exporter.

Multi-provider routing

Since v3.38.0 the gateway routes by model id to several OpenAI-compatible providers at once — OpenAI stays the default, and Ollama, Google Gemini, and Alibaba Qwen are reachable from the same deployment by prefixing (or naming) the model. Anthropic keeps its native Messages path. Since v3.43.0 every provider streams natively: the OpenAI-compatible providers pass their SSE bytes through untouched, and Anthropic's Messages SSE is translated frame-by-frame into OpenAI chat.completion.chunk events (with a mapped finish_reason and an OpenAI-shape usage object on the terminal chunk) — so callers consume one wire format regardless of which provider served the request.

Each provider has its own base URL and (except keyless Ollama) its own key, configured purely via environment variables:

Provider	Model id	Base-URL env (default)	Key env	Streaming
OpenAI (default)	`gpt-4o`, `o1`, `openai/…`, `minimax…`	`AUDITTRAIL_GATEWAY_OPENAI_BASE_URL` → `AUDITTRAIL_OPENAI_BASE_URL` → `https://api.openai.com/v1`	`AUDITTRAIL_GATEWAY_OPENAI_KEY`	native
Anthropic	`claude-…`, `anthropic/…`	`AUDITTRAIL_GATEWAY_ANTHROPIC_BASE_URL` → `AUDITTRAIL_ANTHROPIC_BASE_URL` → `https://api.anthropic.com`	`AUDITTRAIL_GATEWAY_ANTHROPIC_KEY`	native (Messages SSE → OpenAI chunks)
Ollama (keyless, local)	`ollama/llama3.2`	`AUDITTRAIL_GATEWAY_OLLAMA_BASE_URL` → `AUDITTRAIL_OLLAMA_BASE_URL` → `http://localhost:11434/v1`	(none required)	native
Gemini	`gemini-2.0-flash`	`AUDITTRAIL_GATEWAY_GEMINI_BASE_URL` → `AUDITTRAIL_GEMINI_BASE_URL` → `https://generativelanguage.googleapis.com/v1beta/openai`	`AUDITTRAIL_GATEWAY_GEMINI_KEY`	native
Qwen	`qwen-plus`, `qwq-32b`	`AUDITTRAIL_GATEWAY_QWEN_BASE_URL` → `AUDITTRAIL_QWEN_BASE_URL` → `https://dashscope.aliyuncs.com/compatible-mode/v1`	`AUDITTRAIL_GATEWAY_QWEN_KEY`	native

The resolution order for every provider is AUDITTRAIL_GATEWAY_<PROVIDER>_<X> → AUDITTRAIL_<PROVIDER>_<X> → baked default, where <X> is BASE_URL, KEY, or MODEL (an optional per-provider model remap). OpenAI reproduces the historical AUDITTRAIL_GATEWAY_OPENAI_KEY / base-URL-override behaviour exactly, so existing deployments (including MiniMax on the OpenAI path) are unchanged.

GET /api/v1/gateway/proxy/v1/models only advertises a non-OpenAI provider once that provider has a key or an explicit base URL set — so the list honestly reflects what this deployment can actually reach.

Bring your own OpenAI-compatible provider

Any endpoint that speaks the OpenAI Chat Completions API (vLLM, LM Studio, Together, Groq, OpenRouter, …) works today by pointing the OpenAI base URL at it and using an openai/…-prefixed or gpt-… model id — the same mechanism MiniMax already rides. The named Ollama/Gemini/Qwen entries above are just built-in shortcuts with sane default base URLs and their own key slot.

Ollama — the keyless local path

Ollama needs no key. Run Ollama locally (or as a sidecar container), point the gateway at it, and route with an ollama/ prefix:

bash

# Server-side env
export AUDITTRAIL_GATEWAY_OLLAMA_BASE_URL="http://localhost:11434/v1"
 
# Client — note the ollama/ route prefix (stripped before forwarding)
curl https://auditrail.yourco.com/api/v1/gateway/proxy/v1/chat/completions \
  -H "Authorization: Bearer at-gw-YOUR-VIRTUAL-KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"ollama/llama3.2","messages":[{"role":"user","content":"hi"}]}'

The response and the logged span both carry provider = "ollama".

The same providers across NLG, eval, ablation & the assistant

Since v3.39.0 the named-provider routing isn't gateway-only. Every server-side LLM surface picks its provider from the model id and reads a per-purpose key, so a Gemini/Qwen key (or a keyless Ollama base URL) plugged into a purpose makes that surface run on that provider — OpenAI and Anthropic stay the defaults, and each surface still falls back to its honest template / heuristic when nothing is configured.

Purpose	Model source	Key env (per-purpose → global)
NLG (explanations, prompt-optimizer, chat titles)	`AUDITTRAIL_NLG_DEFAULT_MODEL` / caller	`AUDITTRAIL_NLG_<PROVIDER>_KEY` → `AUDITTRAIL_<PROVIDER>_KEY`
Eval (LLM-as-judge)	evaluator `judge_model`	`AUDITTRAIL_EVAL_<PROVIDER>_KEY` → `AUDITTRAIL_<PROVIDER>_KEY`
Ablation (tool-selection)	`AUDITTRAIL_ABLATION_LLM_MODEL` (`provider:model`)	`AUDITTRAIL_ABLATION_<PROVIDER>_KEY` → `AUDITTRAIL_<PROVIDER>_KEY`
Assistant (BYOK chat)	the stored key row's model	its BYOK key, routed to the matching provider base URL

<PROVIDER> is OPENAI, ANTHROPIC, GEMINI, QWEN, or OLLAMA (Ollama is keyless — set only its ..._BASE_URL). Base-URL and model remaps follow the same AUDITTRAIL_<PURPOSE>_<PROVIDER>_{BASE_URL,MODEL} → AUDITTRAIL_<PROVIDER>_{BASE_URL,MODEL} → default order as the gateway.

Supported endpoints

Initial v1.1 coverage:

POST /api/v1/gateway/proxy/v1/chat/completions (streaming + non-streaming)
GET /api/v1/gateway/proxy/v1/models — proxied from the real provider

Not yet supported: embeddings, audio, images, fine-tuning — tracked on the roadmap.

Unsupported models (400)

Since v3.7.0 the gateway is honest about models it can't route. When a provider key is configured and the request names a model the gateway does not recognise, the proxy refuses with a 400 instead of returning a fabricated completion that would look real and log as a normal span:

json

{
  "error": "unsupported_model",
  "model": "totally-made-up-model",
  "supported": ["gpt-*", "o1/o3/o4*", "openai/*", "claude-*", "anthropic/*", "minimax*", "ollama/*", "gemini-*", "qwen*", "mock/* (keyless dev only)"]
}

The check runs before the streaming/non-streaming branch, so both paths fail cleanly (a stream can't be un-started once it begins) and the refused call is logged as an error span rather than a fake success.

Keyless dev fallback. With no provider key configured at all, the proxy still serves a deterministic mock/* completion so you can wire up an integration end-to-end before adding a real key. That fallback path is unaffected by the 400 above — the hard-fail only applies once a provider key is present.

Virtual key management

Method	Path	Purpose
`GET`	`/api/v1/gateway/keys`	List virtual keys (last4 only)
`POST`	`/api/v1/gateway/keys`	Create a new `at-gw-*` key with optional rate / spend caps
`POST`	`/api/v1/gateway/keys/{id}/revoke`	Revoke immediately

Each key has its own rate limit and monthly spend cap so you can hand one to a noisy consumer without risking the bill.

Tenant isolation

Every trace and span is tagged with the virtual key's owning user. Cross-tenant data is impossible by construction — the proxy never looks up rows on another user's scope.

Rate limits

The proxy inherits the downstream provider's rate limits plus the virtual key's own configured cap (default: 60/min, configurable at creation). Exceeding either returns 429.