Changelog

All notable changes to AuditTrail are documented in this file. This is the canonical, always-current changelog. Recent releases are summarised by their user-facing headline; the full per-release detail for v1.0.x is preserved at the bottom.

The live deployment is currently v3.47.0.

[3.47.0] — 2026-07-13

Docs & blog renderer upgraded to next-mdx-remote v6. The MDX renderer behind every docs page and blog post moves from v5 to v6.0.0, closing the one remaining known dependency advisory — npm audit is now clean across the workspace. Rendered output is unchanged and was re-verified end-to-end on the built image: Shiki code highlighting, heading anchors, GFM tables and callouts all render identically, and literal route templates like {trace_id} in the API reference are untouched (v6's new MDX-expression blocking applies to format: "mdx" content; these pages render format: "md" and keep the v5 behaviour explicitly).
Web image build fix that came with it: npm places next-mdx-remote@6 in the workspace-local apps/web/node_modules rather than hoisting it, so the web Dockerfile now carries the workspace-local install into the build stage.

[3.46.0] — 2026-07-13

OTLP ingest now accepts gzip bodies (real bug fix). The OTel Collector's otlphttp exporter — and the OTel SDK's own OTLP/HTTP exporter — compress with gzip by default, but /api/v1/ingest/otlp parsed the raw bytes and returned a confusing 400. That meant the Collector-bridge recipes documented since v3.44.0 (OpenTelemetry, Langfuse) only worked with a manual compression: none override. The endpoint now transparently decompresses Content-Encoding: gzip bodies, with bounded decompression (32 MiB ceiling → clean 413) so a zip bomb can't exhaust memory. Found by a real end-to-end run of the documented recipe; three pinned regression tests.
LlamaIndex integration guide. New LlamaIndex page covering the honest path — OpenLLMetry (traceloop-sdk) auto-instrumentation → OTel Collector (encoding: json) → OTLP ingest — validated end-to-end with keyless Mock models. Documents the real-world gotcha that Traceloop.init() silently skips LlamaIndex on llama-index-core-only installs (call LlamaIndexInstrumentor().instrument() explicitly), and that one query lands as several traces (one per LlamaIndex workflow). LlamaIndex joins the landing marquee as the seventh integration chip.
REGO policies concept page. New REGO policies page for the Governance 2.0 surface: what the built-in REGO subset evaluates, the external-OPA path, the full REST table, and an explicit honesty section — stored policies power the on-demand simulator; automatic enforcement at ingest remains the YAML constitutional governor.
Go SDK module path finalised. The Go SDK's module path is now github.com/Partha-dev01/AuditTrail/packages/sdk-go (the monorepo subdirectory-module form; releases tag as packages/sdk-go/vX.Y.Z). This resolves the last SDK-publishing naming blocker — no separate GitHub org is needed.
Demo chat history healed. Demo login now removes malformed chat sessions (user messages with no assistant reply — an old seeding artifact) and seeds one canonical example conversation, idempotently.

[3.45.0] — 2026-07-12

UI/UX validation sweep + docs truth-up. A three-track live-production UX review (desktop walk, mobile + docs-site UX, docs-content-vs-code audit) confirmed the mobile fixes and the v3.44.0 regressions hold (0 console errors on every route at 1440×900 and 390×844; "All time" traces default and the Refresh refetch verified live). Everything it found ships fixed here:
- Docs ⌘K search is clickable again: the search dialog rendered inside the docs sidebar's sticky stacking context, so the page content painted on top of the results and mouse clicks fell through to the article behind the modal (keyboard selection still worked). The dialog now portals to <body>.
- Analytics ▸ Tool Usage counts successes correctly: the aggregation only counted spans with status ok as successes, so fleets whose tool spans carry complete (HTTP-ingested/seeded) showed a contradictory 0.0% success rate and a frequency chart that only drew tools with errors. complete/success now count as successes (with a pinned regression test).
- Landing hero CTA markup: the hero "Get Started →" was the last remaining <a><button> nesting — now the same animated-link pattern the closing CTA adopted in v3.44.0.
- Compliance tiles labelled: each policy-group tile pairs a rules-passing count with an eval-level pass rate; both now carry labels ("X/Y rules pass" / "eval pass rate") so the diverging numbers no longer read as a contradiction.
- Assistant settings_url fixed: the 412 "no provider key" response pointed at /settings/ai-assistant, which 404s — now /settings?tab=ai-assistant (the docs pages carrying the same path are corrected too).
- Docs truth-up across 14 pages: the API reference header version unpinned from v3.20.0 and its OTLP callout + gateway supported hint list brought current; architecture.md rewritten against reality (Caddy not nginx, SQLite WAL in production not PostgreSQL, real /api/v1/ingest/traces + /api/v1/ablation/run paths, event WS envelope, the real structured rule-YAML schema, current dashboard surfaces); live-fleet endpoints corrected to /fleet/snapshot + /fleet/topology with the window_minutes query parameter (the documented FLEET_WINDOW_MINUTES env var never existed); the OpenAI/Anthropic integration pages now point at the gateway's real upstream-key sources (env vars / agent secrets) instead of the Assistant-only BYOK table; assistant tile table matches the nine kinds the backend actually emits; AUDITTRAIL_DATABASE_URL vs the Alembic-only unprefixed DATABASE_URL disambiguated; deployment guide's API_INTERNAL_URL documented as the build-time ARG it became in v3.37.0, and its security checklist de-nginx'd; SAE candidate-traces example uses the current model catalog; DAG badge/type-color reference matches the shipped components.
- Docs additions: GET /api/v1/admin/database joins the API reference's Admin table; the Python SDK's audittrail scaffold / audittrail list CLI commands are documented; the tried-and-tested deep-search-agent walkthrough is now on the in-app quickstart; the GitHub docs index lists all 37 pages (the six integration guides, Concepts and the SDK parity matrix were missing).

[3.44.0] — 2026-07-12

Validation sweep + audit fixes. A four-track delta audit (backend, frontend/docs, dependencies, live E2E) of everything shipped since v3.11.0 found no critical issues; everything it did find ships fixed here:
- Gateway streaming honesty: a streamed proxy call that fails mid-flight (upstream error event, client disconnect) is now persisted as an error span with the upstream message — previously every streamed span was recorded complete. The detached span-write task is now strongly referenced so it can't be garbage-collected mid-write, and the stream logger reassembles SSE lines split across network chunks so the captured output text is complete.
- Gateway finish_reason: non-streaming Anthropic responses now map stop_reason to the OpenAI vocabulary (max_tokens→length, …), matching the v3.43.0 streaming translator; mock completions report a real total_tokens.
- Compliance math: the legacy GET /api/v1/evaluations/summary now uses the same disjoint severity buckets as /constitutional/summary (pass = total − amber − red); its buckets previously double-counted amber rows and didn't sum to the total.
- Migration parity guard: a new regression test asserts alembic upgrade head reproduces the exact create_all schema, so model/migration drift now fails CI instead of accumulating silently.
- Docs sidebar completeness: the sidebar (desktop + mobile drawer) is now derived from the docs index itself — 11 published pages (pause guards, alerts, compliance, the secrets CLI, the local runner, …) had been missing from it.
- OTel/Langfuse recipes corrected: the Python examples prescribed OTEL_EXPORTER_OTLP_PROTOCOL=http/json, which the Python OTel SDK does not implement — both pages now bridge Python through an OpenTelemetry Collector (otlphttp exporter with encoding: json) and scope the env-var path to JS/Node, where it actually works.
- Smaller fixes: the /traces Refresh button now refetches the list it sits above (it invalidated the wrong query key); the landing closing-CTA no longer nests a button inside a link; the WebSocket event union matches what the backend actually emits; a dead /live link in Concepts now points at /fleet.
- Dependencies: PyJWT floor raised to 2.13.0 (closes five upstream CVEs, none reachable in AuditTrail's HS256-only usage), python-multipart to 0.0.32; npm audit findings drop from 14 to 1 (the remainder is the known next-mdx-remote major bump, mitigated by author-controlled content), and a project-wide postcss override enforces ≥ 8.5.10.

[3.43.0] — 2026-07-11

Gateway: native Anthropic streaming. Streaming requests for claude-* / anthropic/* models through the gateway proxy now stream natively: Anthropic's Messages SSE events are translated frame-by-frame into OpenAI chat.completion.chunk events, so tokens arrive incrementally instead of the previous downgrade (the whole completion was buffered and delivered as a single event at the end). The terminal chunk carries a mapped finish_reason (end_turn→stop, max_tokens→length, tool_use→tool_calls) plus an OpenAI-shape usage object, and upstream error events surface as a 502 instead of being swallowed. With this, every gateway provider streams natively. Streamed Anthropic output text is now also captured on the logged gateway span (the old single-event shape was invisible to the stream logger). Keyless dev environments now get the same scripted mock stream for Anthropic models that other keyless providers serve.

[3.42.0] — 2026-07-06

Traces list "All time" filter. The traces list time-range filter now offers an All time option and defaults to it, so runs older than the previous 30-day maximum (including long-abandoned still-"running" traces) are visible in the list out of the box instead of showing "No traces found". Previously the list capped at "Last 30d" with no way to see older runs, even though the Overview page's "Recent Traces" (which applies no time window) still listed them — a confusing mismatch. The windowed options (last hour / 24h / 7d / 30d) are still one click away. Analytics and Compliance keep their own 7-day default and are unchanged.

[3.41.0] — 2026-07-05

Java SDK Maven namespace. The Java SDK's Maven coordinate now uses the io.imaginaerium.audittrail group id (verifiable on the imaginaerium.in domain) instead of the placeholder io.audittrail, so the artifact can be published to Maven Central under a namespace we can prove ownership of. Only the publish coordinate changed — the Java source package and API are unchanged, so existing imports and code are unaffected.

[3.40.0] — 2026-07-05

Landing page hydration fix. The "Live Fleet" bento tile's mini-radar no longer triggers a React hydration mismatch (#418) on the landing page. Its scan-line animation was gated on an in-view flag that initialised differently on the server (visible) than in the browser's first render (hidden), so the server HTML and the client's first paint disagreed — causing a brief visual flash of the scan line and a needless client-side re-render of the tile. The flag now initialises identically on both sides; the animation still starts when the tile scrolls into view. Console is clean on the landing page.

[3.39.0] — 2026-07-05

Multi-provider routing beyond the gateway. The named OpenAI-compatible providers (Ollama, Gemini, Qwen) are now usable across every server-side LLM surface, not just the gateway proxy: the natural-language explainer, the eval LLM-as-judge, the prompt optimizer, chat-session titles, the ablation tool-selector, and the BYOK assistant. Each surface picks its provider from the model id and reads a per-purpose key (AUDITTRAIL_<PURPOSE>_<PROVIDER>_KEY → AUDITTRAIL_<PROVIDER>_KEY), so a Gemini/Qwen key — or a keyless Ollama base URL — dropped into a purpose makes that surface run on that provider. OpenAI and Anthropic stay the defaults, and each surface still falls back to its honest template/heuristic when nothing is configured (never a fabricated result).
Ablation keys are namespaced. The ablation tool-selector now passes an explicit AuditTrail-namespaced key to its provider client instead of relying on the stock OPENAI_API_KEY / ANTHROPIC_API_KEY, so ablation can run on a different provider/account than the rest of the stack.
Docs fix. The gateway virtual-key revoke endpoint is documented correctly as POST /api/v1/gateway/keys/{id}/revoke.

[3.38.0] — 2026-07-05

Gateway multi-provider routing. The gateway proxy now routes by model id to several OpenAI-compatible providers from a single deployment — OpenAI stays the default, and Ollama, Google Gemini, and Alibaba Qwen are reachable by prefixing (or naming) the model (ollama/llama3.2, gemini-2.0-flash, qwen-plus). Each provider has its own base-URL and key environment variables (AUDITTRAIL_GATEWAY_<PROVIDER>_BASE_URL / _KEY, falling back to AUDITTRAIL_<PROVIDER>_*), and all of them stream natively. Ollama needs no key — point the gateway at a local Ollama and route with an ollama/ prefix. OpenAI (including the existing MiniMax base-URL override) and Anthropic behave exactly as before, and the honest "unrecognised model + a key configured → 400, never a fabricated completion" contract is preserved. GET …/v1/models only advertises a non-OpenAI provider once it is actually configured.

[3.37.0] — 2026-07-05

Go SDK delivery reliability. The Go SDK now re-queues a span batch that fails a transient flush (network error, or a 5xx after retries) at the front of the buffer instead of dropping it, so a temporary outage no longer loses spans. The buffer is bounded (MaxQueue, default 10000 spans) so a persistently unreachable endpoint cannot grow memory without limit, and a 4xx (permanent rejection — bad key or malformed payload) is now logged to stderr rather than swallowed silently.
Local full-stack compose fix. The web image now bakes the /api/* rewrite destination from an API_INTERNAL_URL build arg. Next.js 16 evaluates next.config.ts rewrites() at build time, so the previous runtime environment variable was ignored and a local docker compose up could not reach the API. Production was never affected — Caddy routes /api/* straight to the api container.

[3.36.0] — 2026-07-05

Database health (superadmin). A new read-only Database page in the /admin console reports SQLite health at a glance: the on-disk database, WAL and shared-memory file sizes; page count / page size / free-page count; journal mode and WAL autocheckpoint; and a per-table row-count table ("tables by row count"). It is superadmin-only and strictly read-only — derived from os.stat, read-only PRAGMAs and SELECT count(*), never a mutating PRAGMA — and served by a new GET /api/v1/admin/database endpoint. It degrades honestly: sizes show as em dashes for an in-memory database, and on a non-SQLite backend the SQLite-specific fields are omitted while the row counts still work.

[3.35.0] — 2026-07-04

Pause-guard library (Python SDK). A new opt-in module audittrail.pause_guards ships four composable guards — BudgetGuard, LatencyGuard, ConstitutionalGuard and HumanApprovalGuard — plus a GuardSet and an async guarded_step context manager. You wire the guards you want around an agent step; when a wired guard trips, the library drives the existing pause round-trip (register → block until the operator resumes → return the possibly-edited state). Guards never auto-pause: constructing one fires zero HTTP, and nothing pauses unless you wire it and call run(...). Honest by design — budget/latency read caller-supplied measurements (the SDK keeps no cost/token tally) and the constitutional guard takes a user predicate (there is no SDK-local governor). A tripped guard fails safe on its own evaluation errors: budget/latency default to fail-open (proceed), constitutional and human-approval default to fail-closed (pause for review). Full details: Pause guards. SDK version 3.7.0.
New audittrail.pause_hooks.request_pause_and_wait. The guard library drives its pause through this helper, which unconditionally registers an active pause row and then reuses the same heartbeat / long-poll / by-id terminal read as check_pause — all pause transport stays in one place. A guard-initiated pause can carry a pause_ttl_seconds so the server reaper bounds its lifetime.

[3.34.0] — 2026-07-04

Go & Rust SDK parity — OpenAI/Anthropic wrappers + pause hooks. The Go and Rust SDKs, previously core-client-only, gain provider wrappers and the pause/resume primitive, closing the biggest gaps in the SDK parity matrix. In Go the wrappers are http.RoundTrippers you install on the provider SDK's HTTP client (option.WithHTTPClient(audittrail.OpenAIHTTPClient(audit))); they auto-parent off the trace context on the outgoing request. In Rust the wrappers are generic over any Serialize request/response pair, so they work with async-openai's create_byot or any typed client. Both extract model, token usage and gen_ai.* attributes and emit one span per call. Pause (Go) / pause (Rust) mirror the Python/TS hooks: register → heartbeat → block until the operator resumes.
New GET /api/v1/pauses/{pause_id}. The active-pause poll deliberately excludes terminal rows, so a resumed pause simply disappears from /pauses/{trace_id}/active. This by-id read surfaces the terminal state (resumed / expired / abandoned) and the operator's edited blob — it's how every SDK's pause hook now learns the outcome. A latent bug is fixed alongside: the Python and TypeScript pause long-polls could never observe a resume against a real server (they watched the active endpoint, which never shows a resumed row) — they now read the final state by id.

[3.33.0] — 2026-07-03

audittrail secrets CLI. The Python SDK's console script gains a secrets command group — list, add, rotate, rm — managing the server-side gateway provider secrets from a terminal. The security contract matches the API's write-only design: values are read from a hidden interactive prompt or --value-stdin (there is deliberately no --value flag, because argv is visible to every process on the machine), never echoed or logged, and rm requires typing the secret's name back (--yes for scripts). list shows metadata and last4 only — a stored value can never be retrieved, only rotated. SDK version 3.5.0; docs: the gateway-secrets page gains a CLI section and the SDK parity matrix a new row.
Docs honesty: API-key auth is Authorization: Bearer only. The API reference (and two backend docstrings) claimed an X-API-Key header worked as an alternative — no code path has ever read that header; every SDK sends Authorization: Bearer sk-at-…. The false claim is removed.

[3.32.0] — 2026-07-03

Live trace completion over WebSocket. Span ingest already streamed span_start / span_end into each trace's WebSocket room; the server now also broadcasts a trace_complete event the moment the last running span closes and the trace transitions to complete or error. The trace-detail view (which already listened for this event) flips the status badge and final aggregates live, without a refresh. Both HTTP and OTLP ingest fire it, the broadcast is best-effort (a WebSocket failure never breaks ingest), and a streaming run_langgraph run gets exactly one event at the true end of the run.
Docs. The API reference's WebSocket section now documents the live streaming events precisely — trace_complete had been listed in the frontend event union but was never actually emitted by the server.

[3.31.0] — 2026-07-03

Fleet-level WHY on /analytics. A new Fleet WHY tab shows the tool-selection surrogate's global feature importances — what drives tool choice across the whole deployment (mean |SHAP| over the training set, with an XGBoost-gain fallback), alongside the training-set tool mix, sample count and the model's F1. Served by GET /api/v1/analytics/fleet-why; when the model hasn't trained the response says so instead of fabricating importances.
Time-to-WHY cold start: no restart needed. The surrogate used to train only at API startup, so a fresh install that ingested its 5th tool span while running showed heuristic XAI values until the next restart. A background pass now trains the model automatically as soon as enough data exists (at least 5 tool spans with two distinct tools). Single-tool fleets are declined gracefully — a classifier can't learn from one class — instead of erroring on every pass.
Heuristic SHAP results are now labelled as such. The SHAP feature-importance cards on /sankey and the trace-detail Sankey tab flag heuristic (untrained-at-compute-time) results with an explicit "Heuristic estimate" note and no longer present the fallback's fabricated F1 as a real fidelity score; the low-fidelity warning banner now applies only to genuinely learned attributions.
Analytics honesty. The agent filter pill on /analytics was a dead control (hardcoded names, never applied to any query) — removed.

[3.30.1] — 2026-07-03

Real SHAP values restored on trained surrogates. Modern SHAP returns a 3D array for multiclass models where older releases returned a list of per-class arrays; the explainer choked on the new shape and silently fell back to its heuristic values even when the surrogate was fully trained. Ablation SHAP cards and action evidence now report genuinely learned attributions (labelled surrogate with the model's F1) whenever the model is trained.
Honest fallback wording. When attribution does fall back to heuristics, the explanation now states the actual reason — an untrained model and a SHAP-explainer failure are different situations and are reported as such.
Stable demo evidence anchors. The demo queue's evidence rows now anchor to stable seeded traces instead of the always-reseeded fleet activity burst, so their "Inspect span" links no longer go stale after the next demo login; rows whose anchor trace has been purged are re-anchored automatically.

[3.30.0] — 2026-07-03

Evidence-carrying actions. A deployment-action proposal that targets a trace or span now carries attribution evidence computed at propose time — the no-LLM surrogate SHAP attribution for the decision-relevant tool span (preferring a span flagged amber/red by the governor), with a deterministic English explanation, the top feature weights, the violated rule when one triggered the concern, and a one-click jump into the trace's XAI tabs. The /deployments approval drawer renders it as a "Why this action" panel, so approvers see evidence instead of rubber-stamping a reason string. Provenance is honest: results are labelled surrogate (with its F1) only when the model is really trained, heuristic otherwise; proposals without a resolvable trace/span target state plainly that no span-level evidence exists. Assistant-proposed actions show the same explanation line in chat.

[3.29.0] — 2026-07-03

Eight new docs pages. Six per-integration guides — LangChain, LangGraph, OpenAI, Anthropic, OpenTelemetry and Langfuse — each with install steps, verified wiring snippets and honest notes on which paths are in-process vs remote. Plus a Concepts page (the trace → span → governor → WHY-pipeline → control-plane mental model on one page) and an SDK parity matrix that states plainly what each of the six languages ships today.
The landing framework marquee's chips now link to these per-integration guides instead of interim targets.

[3.28.0] — 2026-07-02

Docs are readable on phones. Below tablet width the docs sidebar collapses into a drawer behind a hamburger in a new sticky top bar — it used to render as a fixed full-width column that squeezed the article to a one-word-per-line sliver on a 390px screen.
Dashboard pages no longer overflow sideways on mobile. The content column let any wide element (the traces table, the trace-detail tab strip) stretch the whole page past the viewport — the body was 731px wide on a 390px phone. Wide tables now scroll inside their own cards, the trace-detail tab strip scrolls in place, and the summary cards and span rows stay fully on-screen.
One Filters button on mobile traces. The filter bar's four dropdowns collapse into a single Filters button (with an active-filter count) that opens a bottom sheet with labeled controls — previously three unlabeled dropdowns dominated the viewport with the third clipped off-screen. The dropdowns also finally display "All agents" / "All envs" / "All models" / "Last 24h" instead of the literal placeholder _all_.

[3.27.0] — 2026-07-02

Landing page closes with a call to action again. A slim single-band CTA ("Ship agents that explain themselves." with Get Started and View Demo) sits between the FAQ and the footer — the old multi-panel Finale stays retired, but the page no longer dead-ends after the FAQ.
The framework marquee only advertises real integrations. Trimmed from 12 logos to the 6 with an actual path today — LangChain, LangGraph, OpenAI, Anthropic, OpenTelemetry, and Langfuse — and every chip is now a link to its integration guide. The scroll pauses on hover or keyboard focus so the chips are clickable. The removed logos return as their per-framework guides ship.
Less dead space mid-page. The before/after comparison card no longer stretches to 648px with an empty bottom third (height capped), and the vertical gaps around the platform-comparison table are roughly halved — no more full blank viewports while scrolling at laptop heights.
The before/after slider starts at 65% so the causal-attribution panel's column header isn't sliced mid-word before you touch the slider.

[3.26.0] — 2026-07-02

Settings General tab now tells the truth. The About card reports the API's real version, environment, rules directory, and credential-masked database path — it used to hardcode a version string. Performance metrics are real: API p95 is measured from a rolling window of actual request durations (shown as "measuring…" until enough samples exist) instead of numbers derived from uptime, and Loaded Rules counts the YAML rules the governor actually evaluates. The dead Environment/Log-level/Rules-directory/DB-path form controls — whose Save silently failed with a 422 on every click — are gone; Preferences now contains exactly the settings the API accepts.
Compliance page no longer contradicts itself. The donut, group tiles, and rules table now describe the same data: severity buckets are disjoint (pass + amber + red = total; amber rows were previously double-counted as both a pass and a warning), group tiles resolve each rule to its real YAML policy group with a real per-group rule count (previously every evaluation lumped into Safety with a hardcoded rule count of 1, rendering "10/1 pass"), the latency-SLO rule's performance group maps to Quality, and the per-rule 7-day trend sparkline is computed from real daily aggregates instead of repeating the all-time rate seven times. Demo evaluations now reference the real YAML rules; demo users carrying evaluations with orphaned rule ids are healed on their next demo login.
One filter system on /traces. The redundant second row of filter pills (a hardcoded 3-agent list and a time pill that never filtered anything) is gone. The global filter bar's agent / environment / model / time-range selections now all actually apply to the trace list, and the Sankey "tool" click-to-filter chip filters for real via a new tool query parameter on GET /api/v1/traces.
Deployments queue stays fresh and actions never clip. Seeded demo action timestamps re-anchor on login (the queue used to show dates from whenever the demo user was first created), and the Actions column has a minimum width with wrapping buttons.
Assistant errors survive reload. If the provider fails mid-stream (for example an HTTP 429), the turn is persisted with an honest error note — reloading the chat used to show your question with no reply at all.
Analytics and compliance time-series charts add right margin so the last axis label isn't clipped.

[3.25.0] — 2026-07-02

SAE extraction path fixed — it now actually runs. The self-host mechanistic-XAI path had never been runnable end-to-end: the adapter imported a nonexistent saelens module (the real package imports as sae_lens), the documented audittrail[sae] extra didn't exist in any package, the activation hook only fit Llama-shaped models, and the Mistral catalog entry pointed at a GPT-2 SAE. Extraction is rewritten on TransformerLens (HookedSAETransformer + SAE.from_pretrained) and validated end-to-end on real weights: gpt2-small → 12 features, all 12 labeled by Neuronpedia.
Honest supported-model catalog. gpt2-small (a.k.a. gpt2) joins as the smallest entry point — CPU-only, ungated, Neuronpedia-labeled. Gemma-2-2b moves to the canonical GemmaScope release; the Llama entry is corrected to meta-llama/Meta-Llama-3-8B-Instruct (the model its published SAE was actually trained on); Mistral-7B is removed (it has no published SAE).
Per-model HF-key gating. AUDITTRAIL_SAE_MODEL_KEY is now only required for license-gated base models (gemma, llama) — gpt2-small needs no key. The span-state API exposes hf_key_required and the pre-flight checklist marks the key row "not required for this model".
Install copy corrected everywhere: pip install sae-lens (and a real [sae] extra now exists on the API package for source checkouts).

[3.24.0] — 2026-07-02

Automatic model pricing. Cost estimation no longer depends on a hand-maintained 18-model table. Known models (~3k: GPT, Claude, Gemini, DeepSeek, Llama, …) now price automatically from a bundled community catalog generated from LiteLLM's price index, including dated variants (gpt-4o-mini-2024-07-18) and provider-prefixed ids (openai/gpt-4o). The catalog auto-refreshes from upstream every 24 hours (AUDITTRAIL_PRICING_AUTO_REFRESH, default on) so new models cost correctly without a redeploy; a failed refresh keeps the current catalog, so air-gapped installs simply run on the snapshot.
Resolution order: agent-supplied cost → rules/pricing.yaml operator overrides (substring, longest key wins) → community catalog (exact/normalized match) → default rate (badged "Default Rate"). rules/pricing.yaml slims down to a pure override file — negotiated rates and zero-cost self-hosted models.
GET /api/v1/settings/pricing now reports the catalog layer (entry count, snapshot vs live refresh, last refresh time) alongside the operator table. The prompt-optimizer cost pre-flight consults the same tables instead of its own hardcoded mirror.

[3.23.0] — 2026-07-02

Sign-in polish. The auth card's primary-tinted drop shadow rendered as a detached glowing slab under the card on the dark backdrop; replaced with a neutral elevation shadow.
Reports page rework. The generator now mirrors the generate endpoint's real contract: the always-included PDF sections (executive summary, trace statistics, compliance summary) render as fixed rows, and only the two real toggles (violation details, cost & latency analytics) remain interactive. The previous "Report Type" selector and "Full View" checkbox were never sent to the API and have been removed. Completed-report badges now use the status color system, and the page gained a description header.
Sidebar. The Live connection pill moved to the very bottom of the sidebar so it reads as ambient status rather than a nav item.
Span pickers show names. The Counterfactuals and SAE tab span pickers displayed the raw span UUID in the closed trigger; they now show the span name (with a truncated id for disambiguation).
Trace-detail tab fallback. An unknown ?tab= value in a trace-detail URL now falls back to the Spans tab instead of rendering an empty panel.

[3.22.0] — 2026-07-02

Honest health signal. GET /api/v1/health rules_loaded now reports the YAML constitutional rules the governor actually evaluates (loaded at startup) instead of a legacy database table that read 0 on a healthy instance.
The demo /alerts page tells a story. Demo-login now seeds two alert rules (p95 latency, cost burn rate) plus two historical firings, so the alerts surface renders real content instead of an empty state.
Correct pagination under sort. GET /api/v1/traces rejects a pagination cursor combined with any sort_by other than created_at (HTTP 400) — the keyset cursor only produces correct pages in creation order.
Faster trace detail. The Sankey, Counterfactuals, SAE, and Raw tabs on /traces/{id} now load lazily, keeping d3 and the syntax highlighter out of the page's initial JavaScript until a tab is opened.
SQLite robustness. The API sets busy_timeout=5000 and synchronous=NORMAL, so concurrent writers wait instead of failing with database is locked. New ix_spans_model index (Alembic 020) serves the model filter and SAE candidate discovery.
Readable quickstart download button. The starter-template "Download ZIP" button on the quickstart page rendered its label cyan-on-cyan (the docs prose link colour out-specified the button text class). Also refreshed two stale copy spots: the landing FAQ's "cloud tier coming in v3.0" and the version literal in the quickstart health-check sample.

[3.21.0] — 2026-07-02

SAE pre-flight + "which runs support SAE?" discovery. The SAE tab now shows a consolidated four-gate pre-flight checklist (model has a published SAE · saelens+torch installed · Hugging Face key set · activations cached) so you see every prerequisite at once instead of one-at-a-time. The panel is now explicit that SAE is a self-host capability — extraction runs on your own API host with the audittrail[sae] extras and an open-weight model; the hosted demo doesn't run it. A new N SAE-ready badge on /traces (backed by GET /api/v1/sae/candidate-traces) tells you which of your runs even could run SAE — honestly 0 on the closed-weight demo.

[3.20.0] — 2026-07-02

Counterfactuals that actually run. Clicking Counterfactuals (or SAE) on a tool span used to fail with "span has no prompt text" — the tool span was keyed by its argument (calculator(expression=…)), not the user prompt. Resolution now walks up to the orchestrator span to recover the real user prompt, so leaf tool spans work. Empty results are split into three honest states — stable (your tool choice is robust), surrogate not trained yet, and surrogate disagrees with the observed choice — instead of one ambiguous card. Counterfactuals also make no LLM call and carry no cost estimate, and every candidate now ships a plain-language explanation and predicted_outcome.
One-click launch + deep links. The DAG node panel now links straight to the Counterfactuals tab (tool spans) or SAE tab (LLM spans) with the span preselected (?tab=…&span=…), and both tabs are deep-linkable.

[3.19.0] — 2026-07-01

Sign-in & sidebar polish. Refined the sign-in backdrop so the footer no longer reads as a detached tier, widened the dashboard sidebar (272 → 300px), and moved the navigation scrollbar flush to the sidebar edge.

[3.18.0] — 2026-07-01

Schema formalisation. Added Alembic migrations 018/019 so a clean install or a future Postgres cutover reproduces the full schema through migrations (the live SQLite deployment is unchanged — it bootstraps its schema directly).

[3.17.0] — 2026-07-01

Security hardening. Raised dependency floors for several CVEs (starlette, python-multipart, cryptography, PyJWT, next, react), added a conservative Content-Security-Policy response header, and pinned all CI workflow actions to exact commit SHAs.

[3.16.0] — 2026-07-01

Documentation truth-up. Brought the API reference current (runtime-control deactivate, TTL expires_at, dispatch provenance fields, the gateway 400 unsupported_model), un-froze the changelog, and fixed stale copy in the quickstart.

[3.15.0] — 2026-07-01

Refreshed landing walkthrough screenshots. Regenerated the product-tour images against the live site, including the Causal Attribution panels (05/07) and a populated counterfactual picker.

[3.14.0] — 2026-07-01

Landing-page trim. Removed the hero stats strip, the pricing teaser, and the finale section to focus the landing narrative.

[3.13.0] — 2026-07-01

Backend correctness & hardening. The gateway now logs streaming spans on a fresh database session (SSE calls are no longer silently un-logged), the deployment-action executor distinguishes transient from permanent failures so a run that couldn't reach a daemon is retried instead of being marked executed, and the gateway validates the virtual key before reading the request body.
New end-to-end tests covering the full governance chain (propose → approve → enforce → deactivate/TTL-reap, plus cross-tenant isolation) and the online-evaluation sampler/executor.

[3.12.0] — 2026-07-01

Frontend audit fixes. The constitutional-violation toast now matches the real broadcast shape (fires on amber/red severities), the runner dispatch/run-output tiles read the correct response fields, and several footer links that 404'd were repaired.

[3.7.0 → 3.11.0] — Sync & Expand (gateway honesty, docs truth-up, runtime-control TTL)

v3.7.0 — Honest unsupported-model gateway. When a provider key is configured, an unrecognised model now returns a 400 unsupported_model instead of a fabricated mock completion. Agent-registry bundle validation hardened (control-character + edge-whitespace rejection).
v3.8.0 — API-reference truth-up. Both the in-app and GitHub-facing API references were rewritten to the full ~222-endpoint / 40-module surface, and six previously-undocumented surfaces got feature docs (SAML SSO + SCIM, the anomaly alert engine, datasets + eval runs, pause / inspect / edit+resume, EU AI Act compliance, and the runner run-status panel).
v3.9.0 — Runtime-control TTL. Runtime controls (switch_model / throttle / disable_flag) gained an optional expiry (expires_at); a background reaper auto-deactivates expired controls so an expired throttle/disable stops enforcing (fail-safe).
v3.10.0 — Manual deactivate. Operators can deactivate an active runtime control from the Run Status page (POST /deployments/actions/runtime-controls/{id}/deactivate), which also shows each control's remaining TTL.
v3.11.0 — Dispatch version provenance. Registered-agent dispatches now persist and surface agent_definition_id / agent_version_id / agent_version, so a run's status and detail show exactly which agent version executed (null for built-in templates).

[3.1.0 → 3.6.4] — Agent Operations Control Plane + launch prep

AuditTrail evolved from observe-only into an honest control plane that orchestrates, versions, rolls out, governs, and kills agent runs — while execution stays on the user's own runners (we do not host execution).

Runner template allowlist — a two-gate, argv-only dispatch path so the operator can never ship arbitrary code to a connected daemon.
Agent registry + immutable versioning — register agent code, version it immutably (server-computed checksum), promote a current version, dispatch a pinned version to your own daemon.
Safe BYOK gateway secrets — store provider keys encrypted at rest, used only server-side in the gateway forward path; the plaintext is write-only.
Executable deployment actions — approved Tier-2 controls (switch_model / throttle / disable_flag) are enforced on gateway-routed traffic; kill_run targets your connected daemon; the rest are honest external handoffs recorded via mark-executed.
Run-status panel — connection state, live dispatch stdout (SSE), and the active runtime controls.
OpenAI-compatible gateway base-URL override, plus E2E-test-driven fixes (schema-drift startup bridge, runner-token /runner/status auth, live-pill WS routing, footer hydration). Migrations 014–017.

[3.0.0] — Public-launch prep (Phase 7–9)

Full Edit + Resume for paused agents (feature-flagged, schema-hash guarded; Alembic 012) + heartbeat / ack / reaper lifecycle.
Local runner daemon — audittrail daemon ships with the Python SDK, connects over WebSocket with an sk-atd-… token, and runs allowlisted agents on your own machine (Alembic 013).
Chat history (Alembic 011), prompt canary with auto-rollback (Alembic 010), and the Python SDK AuditTrailClient + run_langgraph.
Security hardening: webhook SSRF guard, governor ReDoS backstop, JWT exp required, eval() removed in favor of an AST-allowlisted evaluator. /pricing
- /blog pages, NOTICE refresh, 7 SDK pre-publish CI workflows.

[2.x] — Platform expansion (v1.1 → v2.9)

v1.1 — TypeScript SDK, OpenAI-compatible gateway proxy, OTLP/JSON ingest, EU AI Act compliance surface, and the anomaly alert engine.
v1.2 — multi-tenant organizations + members + invites, datasets + eval runs, the prompt lab, projects, and SAML SSO / SCIM provisioning.
v1.3 — NL explanations, counterfactuals, the prompt optimizer, adaptive model routing, and the deployment-intelligence action queue.
v2.0 — SAE mechanistic XAI, REGO (Governance 2.0) policies, and the Stripe billing surface.
v2.2 → v2.9 — Live Fleet, the Operations Assistant (generative-UI tiles + BYOK chat), pause/inspect preview, per-user rule bindings, live cost/latency DAG overlay, the docs site, and the landing-page build-out.

[1.0.2] - 2026-04-08

Comprehensive PRD ↔ codebase sync sprint. Addresses every CRITICAL and HIGH finding from the 2026-04-07 audit (docs/research/audit-2026-04-07/), plus a follow-up wave that builds out every previously-deferred PRD line item and ships the cross-tenant /admin route group.

Sync sprint Phase 3 — previously-deferred PRD items now shipped

Adaptive ablation strategies (FR-017). causal.run_ablation now accepts strategy: "linear" | "adaptive" | "hierarchical" (default "adaptive") and runs_per_segment ∈ [1,5]. The adaptive path runs one cheap pass over every segment, then re-scores only segments whose attribution falls in [0.05, 0.30] with the full quota. The hierarchical path buckets prompts with >12 segments into 4 coarse groups, scores those, then drills into the top-2. Both apply early termination once the top-3 segments cover ≥90% of the cumulative attribution.
Sankey segment merge/split editor (FR-025). New segment-editor component lets users merge adjacent segments or split at a word boundary before running ablation. Backed by segments_override on /ablation/run and a new /ablation/segments/preview helper.
DAG semantic 3-level zoom (FR-011) + Dagre Web Worker (NFR-013) for 200+ node graphs.
Compliance trend deployment markers (US-025) sourced from audit_log rule events.
Email verification flow (/auth/verify-email, /auth/resend-verification).
Refresh token rotation with theft detection (migration 004, /auth/refresh, silent refresh in the api-client).
audittrail CLI (rules validate, rules list, bootstrap-superadmin).
Global filter wiring — useFilterStore carries phrase + tool, GlobalFilterBar exposes chips, /traces seeds the store from URL params.

Wave 18 — RBAC `/admin` route group

New auth.require_superadmin dependency + routes/admin.py with 8 endpoints (users CRUD, audit-log, audit-log/export CSV stream, tenants list, instance/info). Self-protection prevents a superadmin from revoking their own flag or deactivating their own account.
Frontend app/(admin)/admin/ route group with layout (auth guard), overview, users, tenants, audit-log pages.
Sidebar gains a superadmin-only "Admin" entry.
audittrail bootstrap-superadmin CLI for the fresh-deployment promote-or-create flow.

New POST /api/v1/auth/demo-login (CSRF-exempt, 20/min). Find-or- creates a demo@audittrail.dev user, seeds 10 sample traces with 30 spans on first call, issues normal cookies. Landing page "View Demo" buttons hit this endpoint.
useCurrentUser exposes is_superadmin + email_verified. New formatRoleLabel helper renders "Admin" / "Viewer" / "Superadmin" consistently across the dashboard sidebar and profile page.
Sign-out button in the dashboard sidebar footer is now an explicit labelled button.
Verify-email page wrapped in <Suspense> so Next.js 16 static rendering accepts useSearchParams (CI build was failing).

CRITICAL security fixes (cross-tenant data exposure)

Fixed: WebSocket trace subscription bypass (#115). routes/ws.py now verifies Trace.user_id == authenticated_user.id before attaching the connection to the trace's ring buffer; runtime subscribe messages are also gated. Closes a live cross-tenant exfiltration vector where any authenticated user could replay the last 100 events of any other tenant's trace.
Fixed: DELETE /api/v1/traces/flush-all global wipe (#116). Now scoped to Trace.user_id == calling admin's id; tenant admins can no longer wipe other tenants' traces.
Fixed: Anonymous span ingestion (#117). Both POST /api/v1/ingest/spans and /ingest/traces switched from Depends(get_optional_user) to a new require_user_or_apikey dependency that rejects anonymous calls with 401. Adds explicit cross-tenant guard inside _ensure_trace/_ensure_agent.

HIGH security fixes

Fixed: Cross-tenant agents endpoints (#118). Added Agent.user_id via Alembic 003 + new _visible_to(user) clause used by every read/write in routes/agents.py. Two tenants can register agents with the same display name without colliding.
Fixed: Sparkline day_stmt user_id leak (#119) — routes/analytics.py:233-238 now joins/filters by user.
Fixed: /analytics/slowest-spans p95/p99 leak (#120) — duration query now joins Trace and filters by user.
Fixed: Global rules / settings (#121). Rules and settings split into instance defaults vs per-tenant overrides. Tenants can no longer disable safety rules platform-wide. See "Per-tenant scoping" below for the full migration.
Fixed: JWT non-revocable + 24h TTL (#122). Default TTL shortened to 1 hour (AUDITTRAIL_JWT_EXPIRY_HOURS). get_current_user already re-fetches the row each request so role demotion is reflected within the access-token TTL.
Hardened: seed.py production guard (#123). Now requires both AUDITTRAIL_DEBUG=true AND AUDITTRAIL_ALLOW_SEED=1. The opt-in flag is intentionally outside the Pydantic settings so a stale .env cannot accidentally re-seed prod.
Added: per-route rate limits on heavy endpoints (#124). /ablation/run, /ingest/spans, /ingest/traces, /reports/generate now carry explicit @limiter.limit("…/minute") decorators in addition to the global default. Concurrent ablation jobs per user are also capped via ablation_max_concurrent_per_user.
Hardened: ReDoS surface (#125). Constitutional rule regex patterns are screened against a deny-list of catastrophic- backtracking shapes ((a+)+, (a*)*, (a|a)+) before compilation. Pattern + input length caps from 002 are kept.

MEDIUM security hardening

Added: SecurityHeadersMiddleware in main.py so HSTS, X-Frame-Options DENY, X-Content-Type-Options nosniff, Referrer- Policy, Permissions-Policy and COOP are present even when no fronting proxy adds them.
Added: BodySizeLimitMiddleware rejects requests larger than 5 MiB at the FastAPI layer (defence-in-depth complement to nginx).
Added: AuditLog table (NFR-023, #137). New audit_log table + audittrail/audit_log.py helper. Every admin mutation (rule.update, rule.create, rule.delete_override, rule.reload, agent.create, agent.delete, settings.patch, webhook.*, report.generate, password.reset_*, auth.logout) records who, when, payload, IP and user-agent.
Widened: PII redaction scope (#135). _upsert_span now redacts error_message, attributes, and tool_calls.{arguments,result} in addition to span.input/output.
Added: Pydantic extra="forbid" on the new ablation, webhook, and password-reset request schemas to silently reject unknown fields and prevent client-side typos from masking validation failures.

Per-tenant scoping (Wave 5)

Added: Alembic migration 003 (003_tenant_scoping_and_audit_log.py). Adds user_id to agents, rules; adds user_settings table for per-tenant overrides; adds users.is_superadmin, users.email_verified, users.reset_token, etc.; adds the audit_log and webhooks tables; adds traces.archived_at / traces.deleted_at for the lifecycle state machine.
Added: Tenant-scoped rule layer. app.state.rules is now the YAML-loaded instance default; per-tenant overrides live in app.state.tenant_rules[user_id] and shadow defaults at evaluation time. routes/ingest._evaluate_and_broadcast calls _effective_rules(request, user) so each tenant sees their own effective ruleset during constitutional evaluation.
Added: UserSetting model + tenant-scoped settings reads/writes. Reads merge per-user rows on top of instance defaults; writes always land in user_settings for the calling tenant.
Added: Trace cross-tenant guard in _ensure_trace. Spans cannot be appended to a trace owned by a different user; the call returns 403 instead of silently mutating someone else's row.
Added: WebSocket tenant filter on /ws/live. The ConnectionManager records (user_id, is_admin) on each live subscriber and _can_see_trace filters every broadcast.

Schema management

Removed: Runtime ALTER TABLE block in init_db() (#100). Schema management is now Alembic-only; the entrypoint runs alembic upgrade head and init_db only ensures fresh-install table creation + WAL mode.

Constitutional governor

Added: BR-002 duplicate rule-id detection. load_rules now cross-checks IDs across YAML files and skips duplicates with a clear error log naming both source files.
Added: BR-003 amber ≤ red validator. Pydantic model_validator on RuleDefinition rejects rules where the amber threshold sits past the red threshold for a given operator.
Fixed: WS event envelope key drift (#95). The dead-code path in governor.evaluate_trace_and_broadcast was emitting "type" instead of "event"; now matches the rest of the codebase.

Causal attribution & SHAP

Added: Real LLM ablation path (#32). causal._real_llm_tool_selection hits Anthropic or OpenAI when ablation_real_llm_enabled=true in settings, falling back to the heuristic on provider failure.
Added: 3× averaging for ablation runs (BR-009 / #36). run_ablation now accepts runs_per_segment (1..5) and runs the scorer in asyncio.gather parallel for each masked variant before averaging the normalised scores.
Added: Surrogate model warm-train on startup (#33). The lifespan task now pulls up to 500 recent tool spans, builds a feature matrix, and calls SurrogateModel.train() so SHAP responses can return real F1 scores instead of the hard-coded 0.85.

Auth & SDK

Added: /auth/forgot-password and /auth/reset-password (#34). v1.0 implementation logs the reset token to the server console per PRD §15.1; v2.0 will deliver via email.
Added: Frontend /forgot-password page wired to the backend.
Added: audittrail.init(frameworks=...) SDK shim (#35). Honors the public API documented in PRD §14.3; warns on unknown framework names; calls through to init_tracer.
Hardened: /auth/logout records the action in audit_log even for cookies that cannot be decoded.

Webhooks (#43)

Added: outbound webhook delivery service. New webhooks.py module + routes/webhooks.py CRUD endpoints. Supports Slack webhooks, PagerDuty Events v2, and generic HTTPS with HMAC-SHA256 request signing. Delivery happens off the request path via asyncio.create_task and is retried with exponential backoff. Failures are persisted to Webhook.last_error.
Wired: ingest pipeline fan-out. Constitutional violations now fan out to the trace owner's enabled webhooks at the same severity threshold the user configured.

Frontend

Removed: Dead Continue with API Key button + Apache license footer link from the login page (#179) — replaced with a "Create an account" link and a working /forgot-password link.
Removed: components/dashboard/sidebar.tsx (285 LOC dead code).
Removed: components/trace/trace-table.tsx + trace-columns.tsx (640 LOC dead code).
Removed: nuqs dependency (declared, never imported).
Added: Sign-out button to the dashboard sidebar footer that hits POST /v1/auth/logout and routes to /login.
Added: /forgot-password page matching the login UI.
Moved: tests/capture-screenshots.spec.ts → scripts/ so it is not auto-run as part of the e2e suite.
Added: @audittrail/shared workspace dependency in apps/web/package.json so the orphaned shared types package can finally be imported by frontend code.

Architecture / deployment

Removed: in-container nginx service from docker-compose.yml (#88). Production already uses Caddy on the Lightsail host as the single TLS / proxy layer; the duplicate in-container nginx is now retired.

PRD sync (highlights — see SYNC_AUDIT_REPORT.md for the full diff)

Documented per-tenant scoping, AuditLog, Webhook, UserSetting in the ERD entity list.
Removed CrewAI adapter (FR-007b) from the roadmap entirely. The intended adapter never shipped, and PRD §23.8 now reflects that the v1.0 framework matrix is LangGraph + LangChain + AutoGen + raw OpenAI.
Removed Mermaid Sankey fallback (FR-027a) from the roadmap and from the response schema. The hand-rolled SVG Sankey covers every supported display surface and the Mermaid path was unimplemented prose.
Added § "Reference Production Topology" describing Lightsail + Caddy + cron-poll deploy.

[1.1.0] - 2026-04-01

Authentication & User Flow

Fixed: Register page styling mismatch. The register page used shadcn Card/Button/Input components while the login page used inline styles with #171717 card and #0a0a0a background. Restyled register to match login exactly (same radial gradient glow, card dimensions, input styling).
- File: apps/web/app/(auth)/register/page.tsx
Fixed: Default email pre-filled on login. Login form had defaultValues: { email: "priya@company.com" } hardcoded. Changed to empty string so users enter their own credentials.
- File: apps/web/app/(auth)/login/page.tsx
Fixed: Login error silently redirecting to demo. The login catch block was redirecting to /overview (demo dashboard) on ANY error, including invalid credentials. Now shows an inline error message instead.
- File: apps/web/app/(auth)/login/page.tsx
Fixed: Mock data leaking to real accounts. The useCurrentUser hook fell back to a hardcoded "Priya S." demo user whenever auth failed (isDemo = isError || !data), causing real registered users to see Priya's profile on page refresh. Rewrote the hook to only activate demo mode when explicitly requested via ?demo=true query parameter. Unauthenticated users are now redirected to login.
- Files: apps/web/hooks/use-current-user.ts, apps/web/app/(dashboard)/layout.tsx, apps/web/app/page.tsx

Error Handling

Added: Global error boundary. app/error.tsx catches unhandled runtime errors and shows a dark-themed retry page.
- File: apps/web/app/error.tsx
Added: 404 page. app/not-found.tsx shows a dark-themed "Page not found" page with a link back to home.
- File: apps/web/app/not-found.tsx

User Profile & Identity

Added: Dynamic user avatar via useCurrentUser hook. Created a React Query hook that fetches the authenticated user from GET /v1/auth/me. Wired into dashboard sidebar, layout header, and profile page to replace hardcoded "PS" / "Priya S." / "AI Engineer" values.
- Files: apps/web/hooks/use-current-user.ts, apps/web/components/dashboard/sidebar.tsx, apps/web/app/(dashboard)/layout.tsx, apps/web/app/(dashboard)/profile/page.tsx
Added: Platform-aware keyboard shortcut display. Sidebar now shows "Cmd+K" on macOS and "Ctrl K" on Windows/Linux, using a hydration-safe useEffect pattern.
- File: apps/web/components/dashboard/sidebar.tsx
Added: Keyboard shortcuts hint card on Overview page. Animated card showing 5 key shortcuts (Cmd/Ctrl+K, G->O, G->T, G->C, G->A) with platform-aware modifier key.
- File: apps/web/app/(dashboard)/overview/page.tsx

Testing

Added: Vitest unit test infrastructure. Installed vitest, happy-dom, @testing-library/react, @testing-library/jest-dom. Created vitest.config.ts with path aliases.
- Files: apps/web/vitest.config.ts, apps/web/package.json
Added: Unit tests. 21 tests across 2 files covering formatDuration, formatCost, formatTokens, truncateId, formatRelativeTime (utils) and ApiClient GET/POST/204/error/params handling (api-client).
- Files: apps/web/__tests__/utils.test.ts, apps/web/__tests__/api-client.test.ts

API & Schema

Fixed: ToolCallInSpan result field validation. The result field in ToolCallInSpan expects dict | None, not a plain string. Passing a string caused a 422 Unprocessable Entity error. Documented the correct format {"output": "..."} in API reference with a "Common mistake" callout.
- File: docs/api-reference.md
Documented: API response envelope pattern. All GET endpoints wrap responses in {"data": ...}. Added a note at the top of the API reference warning consumers to read resp.json()["data"], not resp.json() directly.
- File: docs/api-reference.md

DAG Visualization

Fixed: DAG vertical spacing. Dagre layout ranksep increased from 60 to 120 pixels, nodesep from 40 to 50, margins from 20 to 30. Nodes now have proper vertical breathing room.
- File: apps/web/components/dag/dag-viewer.tsx
Added: Live WebSocket DAG updates. DAG viewer now subscribes to useTraceWebSocket(traceId) and invalidates the React Query cache on span_start, span_end, and trace_complete events, so the DAG re-renders as spans arrive.
- File: apps/web/components/dag/dag-viewer.tsx

Fixed: Prompt segments overflow. The ablation cost dialog's prompt segment list overflowed the modal without scrolling. Replaced ScrollArea (which wasn't respecting max-h) with a native overflow-y-auto div with explicit height and thin scrollbar styling.
- File: apps/web/components/sankey/ablation-cost-dialog.tsx

Sankey Diagram

Fixed: Hardcoded tools replaced with dynamic extraction. The ablation engine hardcoded ["web_search", "calculator", "read_file"] as tool candidates. Now queries ToolCall records and tool-type span names from the actual trace to build the candidate list dynamically. Works with any tool set.
- Files: apps/api/src/audittrail/routes/ablation.py, apps/api/src/audittrail/causal.py, apps/api/src/audittrail/surrogate.py
Fixed: Single reasoning node replaced with multiple nodes. The Sankey middle column showed only one "Tool Selector" node. Now generates multiple reasoning steps (e.g., "Identify computation need", "Resolve data source", "Plan comparison logic") based on tool names, with a keyword-to-label mapping and a dynamic fallback for unknown tool names.
- File: apps/api/src/audittrail/causal.py
Fixed: Spider-web links replaced with targeted connections. Every prompt phrase was connecting to every reasoning node via the fallback connected_reasoning = set(reasoning_nodes.keys()). Replaced with text-affinity scoring that matches phrase content against tool names. Each phrase connects to 1-3 most relevant reasoning nodes, with weight proportional to affinity strength.
- File: apps/api/src/audittrail/causal.py
Fixed: Link thickness normalization. Sankey link glow width was Math.max(3, attr * 40) which produced uniform thin lines when all attribution values were small. Now normalizes against the maximum link value so the strongest link always fills the visual range.
- File: apps/web/components/sankey/sankey-viewer.tsx
Fixed: Hover tooltip on Sankey links. SVG container had pointerEvents: "none" which blocked all mouse events on hit-area paths. Split into two SVG layers: Layer 1 (z-index 1) for visual paths with pointerEvents: "none", Layer 2 (z-index 10) for invisible hit-area paths with pointerEvents: "stroke". Tooltip now appears on hover showing source, target, attribution score, and confidence level.
- File: apps/web/components/sankey/sankey-viewer.tsx

Live Tracing

Added: Incremental span ingestion. Test agent now sends spans during execution using send_running() (on tool/LLM start) and send_complete() (on end), instead of batching all spans at the end. The trace shows as "Running" during execution and transitions to "Complete" when done.
- File: test-agent/agent.py (now examples/langgraph-agent/agent.py)
Added: WebSocket broadcast for HTTP span ingestion. The POST /api/v1/ingest/spans endpoint now broadcasts span_start/span_end events via WebSocket after each span upsert. Previously only internal @traceable decorator spans triggered broadcasts.
- File: apps/api/src/audittrail/routes/ingest.py
Added: Hierarchical live spans. The test agent creates intermediate grouping spans (execute_tools chain, research_sub_agent agent) dynamically during execution, producing a multi-layered DAG instead of a flat tree.
- File: test-agent/agent.py (now examples/langgraph-agent/agent.py)
Fixed: Trace status stuck on "Running". The trace detail page did not subscribe to WebSocket events, so the status badge never updated after the trace completed. Added useTraceWebSocket subscription that invalidates ["trace", id] and ["trace-spans", id] queries on span events.
- File: apps/web/app/(dashboard)/traces/[id]/page.tsx

Security & User Isolation

Added: user_id column on Trace model. Traces are now owned by the user who created them. Each user only sees their own traces across all endpoints (traces, analytics, evaluations, ablation, spans). Column is nullable — existing traces with NULL user_id are invisible to all users (orphaned).
- Files: apps/api/src/audittrail/models.py, apps/api/src/audittrail/schemas.py
Added: Safe schema migration in init_db(). On startup, init_db() checks if the user_id column exists on the traces table. If missing, runs ALTER TABLE traces ADD COLUMN user_id with FK constraint and index. Safe to run repeatedly.
- File: apps/api/src/audittrail/database.py
Added: Authentication on all trace-related endpoints. 35+ endpoints across 7 route files now require Depends(get_current_user) and filter queries by Trace.user_id == user.id. Unauthenticated requests return 401.
- Files: routes/traces.py (9 endpoints), routes/analytics.py (7 endpoints), routes/evaluations.py (6 endpoints), routes/ablation.py (6 endpoints), routes/spans.py (2 endpoints)
Added: Optional auth on ingest endpoints. POST /ingest/spans and POST /ingest/traces use get_optional_user — authenticated users get traces assigned to their user_id; unauthenticated agents create traces with user_id=NULL.
- File: apps/api/src/audittrail/routes/ingest.py
Added: Admin-only flush endpoint. DELETE /api/v1/traces/flush-all deletes ALL traces in the database. Requires admin role. For clearing dev/seed data.
- File: apps/api/src/audittrail/routes/traces.py
Added: Frontend 401 redirect. API client now detects 401 responses and redirects the browser to /login. Prevents broken dashboard state when session expires.
- File: apps/web/lib/api-client.ts
Fixed: Demo mode isolation. useCurrentUser hook no longer falls back to "Priya S." demo user on auth errors. Demo mode only activates when ?demo=true is in the URL (triggered by "View Demo" buttons on landing page). Real users see empty state if they have no traces.
- Files: apps/web/hooks/use-current-user.ts, apps/web/app/(dashboard)/layout.tsx, apps/web/app/page.tsx

Documentation

Updated: Quickstart guide. Complete rewrite with zero-to-dashboard flow, agent integration guide (HTTP + Callback + Template), common pitfalls table, live tracing pattern, and corrected example paths.
- File: docs/quickstart.md
Added: Example LangGraph agent. A complete working agent with 7 tools (web_search, calculator, get_current_time, summarize_findings, database_lookup, read_config_file, compare_values) that demonstrates live tracing, hierarchical span grouping, and correct tool_calls format.
- Files: examples/langgraph-agent/agent.py, examples/langgraph-agent/requirements.txt
Added: Goal-agnostic agent template. A minimal copy-paste-ready template (template.py) with the AuditTrailIngestor class, placeholder tools, and clear TODO markers for customization.
- File: examples/langgraph-agent/template.py
Added: Examples README. Usage guide with architecture diagram, tool format documentation, and response envelope warning.
- File: examples/langgraph-agent/README.md
Added: Changelog. This file documenting all session changes.
- File: docs/changelog.md