AuditTrail Architecture Overview

System Diagram

                          +------------------+
                          |   Browser / UI   |
                          |  (Next.js App)   |
                          +--------+---------+
                                   |
                          HTTP / WebSocket
                                   |
                          +--------+---------+
                          |      Nginx       |
                          | (Reverse Proxy)  |
                          +---+---------+----+
                              |         |
                 /api/, /ws/  |         |  / (everything else)
                              |         |
                    +---------+--+   +--+---------+
                    |  FastAPI   |   |  Next.js   |
                    |  Backend   |   |  Frontend  |
                    |  :8000     |   |  :3000     |
                    +-----+------+   +------------+
                          |
          +---------------+---------------+
          |               |               |
   +------+------+ +------+------+ +------+------+
   |   Trace     | | Causal      | | Constitu-   |
   |   Collector | | Attribution | | tional      |
   |   + DAG     | | Engine      | | Governor    |
   |   Builder   | | (Ablation   | | (Rule       |
   |             | |  + SHAP)    | |  Engine)    |
   +------+------+ +------+------+ +------+------+
          |               |               |
          +-------+-------+-------+-------+
                  |               |
           +------+------+ +------+------+
           |  SQLite /   | |  Rule YAML  |
           |  PostgreSQL | |  Files      |
           +-------------+ +-------------+

Component Descriptions

Frontend (Next.js 16 + shadcn/ui)

The frontend is a Next.js 16 App Router application using React 19. Component primitives are sourced from @base-ui/react via the shadcn base-nova style (not @radix-ui/* directly). Animation uses motion v12 (the rebranded Framer Motion package, imported as motion/react). It provides six primary views:

  1. Traces -- Paginated, searchable list of all agent execution traces with drill-down to detail.
  2. DAG View -- Interactive React Flow graph showing the execution decision tree for a selected trace. Custom node types render differently based on span type (LLM call, tool invocation, chain step). Nodes are color-coded by status and constitutional evaluation results. Auto-collapses subtrees beyond depth 3 when the trace exceeds 100 nodes.
  3. Sankey View -- Hand-rolled SVG Sankey (the d3-sankey package is in deps for type imports only — the layout is computed manually so per-port routing avoids the overlap that the default sankey-extents algorithm produces). Flow width is proportional to attribution strength. Top-N filter (default 7) aggregates remaining flows into an "Other" bucket. Click a flow to filter the trace list to its source phrase + target tool.
  4. Constitutional Dashboard -- Real-time compliance status showing rule pass/amber/red counts, trend charts, and drill-down to individual evaluations.
  5. Analytics -- Recharts-based dashboard with cost tracking, latency distributions, tool usage breakdowns, and error rate trends.
  6. Reports -- Audit report generation interface with period selection and export options.

State management uses Zustand for ephemeral UI state (sidebar collapse, command palette, websocket buffers). Server state lives in TanStack Query with a 60-second stale time for analytics and dashboard endpoints.

Real-time updates are delivered via WebSocket connection to the backend (/ws/traces/{trace_id} for a single trace, /ws/live for the global feed). The envelope key is event (not type). Each /ws/live connection is tenant-scoped — viewers only receive events for traces they own; admins receive everything.

Backend (FastAPI)

The backend is a Python FastAPI application providing REST endpoints and WebSocket connections. It has four major subsystems:

Trace Collector + DAG Builder

The collector ingests trace and span data from instrumented agents via the REST API. It stores raw trace data in the database and reconstructs the execution DAG on demand by traversing the span parent-child tree.

The middleware hooks into LangGraph's callback system to capture full state at every node transition -- inputs, outputs, model parameters, token counts, timing, and error state. Middleware is async and non-blocking to stay within the <100ms latency budget.

Causal Attribution Engine

The ablation engine implements prompt-level causal attribution:

  1. Segmentation -- Split the user prompt into meaningful phrases.
  2. Ablation -- For each phrase, mask it and re-run the agent to measure which tool selections change.
  3. Averaging -- Run each ablation 3x to reduce noise from LLM non-determinism. Report confidence intervals.
  4. SHAP -- Train a surrogate model on ablation results and compute SHAP values for fine-grained feature importance.
  5. Sankey Construction -- Build the Sankey diagram data structure mapping phrases to reasoning steps to tool calls.

Ablation is opt-in and uses cheaper models (configurable, defaults to a small model) for re-runs. Results are cached by prompt hash so repeated analyses are instant.

Constitutional Governor

The governor evaluates every span against a set of rules defined in YAML files. Rules specify:

  • Target -- Which span types to evaluate (tool, llm, chain, or all).
  • Condition -- A Python expression evaluated against the span data.
  • Thresholds -- Amber (approaching boundary, default 80%) and red (violation) levels.

The governor's key insight is boundary detection: it flags actions that APPROACH a violation (amber) even if they don't cross it. This "almost violated" signal is more informative than actual violations for proactive governance.

Evaluation results are stored as ConstitutionalEvaluation records and streamed to the frontend via WebSocket for real-time toast notifications.

Report Generator

Generates PDF audit reports using ReportLab or WeasyPrint. Reports include trace summaries, constitutional evaluation results, analytics charts, and compliance recommendations for a specified time period.

Database

Development uses SQLite (via aiosqlite for async support). Production targets PostgreSQL (via asyncpg). Schema migrations are managed with Alembic.

Core tables:

  • traces -- Top-level agent execution records
  • spans -- Individual execution steps within traces
  • tool_calls -- Tool invocations linked to spans
  • constitutional_evaluations -- Rule evaluation results
  • ablation_results -- Causal attribution analysis outputs
  • rules -- Loaded constitutional rule definitions
  • agents -- Registered agent metadata
  • api_keys -- Authentication keys for trace ingestion
  • users -- Dashboard user accounts
  • reports -- Generated audit report metadata
  • settings -- System configuration key-value pairs

Constitutional Rules (YAML)

Rules are defined in YAML files in the rules/ directory. The backend loads and validates them at startup using Pydantic. Example:

yaml
rules:
  - name: no_bulk_delete
    description: Flag attempts to delete multiple files at once
    version: "1.0"
    target_span_type: tool
    condition: "'delete' in tool_name and len(arguments.get('files', [])) > 5"
    threshold_amber: 0.8
    threshold_red: 1.0
    policy_group: data_safety

Data Flow

Trace Ingestion Flow

Agent (LangGraph)
  -> Middleware captures span data (async, &lt;100ms)
  -> POST /api/v1/traces/ingest
  -> Collector validates and stores in DB
  -> WebSocket broadcasts trace.started / span.created events
  -> Frontend updates DAG and trace list in real-time

Causal Attribution Flow

User clicks "Analyze" on a span in the UI
  -> POST /api/v1/ablation/trigger
  -> Backend estimates cost, returns estimate
  -> User confirms
  -> Ablation engine segments prompt, runs ablation passes
  -> WebSocket broadcasts ablation.progress events
  -> On completion, SHAP values computed, Sankey data built
  -> Frontend renders interactive Sankey diagram

Constitutional Evaluation Flow

Span ingested by collector
  -> Governor evaluates span against all enabled rules
  -> Evaluation results stored as ConstitutionalEvaluation records
  -> If severity >= amber, WebSocket broadcasts constitutional.alert
  -> Frontend shows toast notification and updates compliance dashboard

API Communication Pattern

The frontend communicates with the backend via two channels:

  1. REST API (HTTP) -- All CRUD operations, queries, report generation, ablation triggers. Endpoints are versioned under /api/v1/. Request/response bodies use JSON with Pydantic validation on the backend and Zod validation on the frontend.

  2. WebSocket (/ws/) -- Real-time event streaming. The frontend opens a single persistent WebSocket connection on page load. Events are JSON objects with a type field for discrimination and a payload field containing the event data. Event types include trace lifecycle, span updates, constitutional alerts, and ablation progress.

Database Choice Rationale

SQLite is used for development because it requires zero infrastructure setup -- the database is a single file created automatically. This eliminates the "install PostgreSQL first" barrier for contributors.

PostgreSQL is the production target because it provides:

  • Concurrent write support (SQLite uses file-level locking)
  • JSONB columns for efficient metadata queries
  • Full-text search for trace content
  • Connection pooling for high-throughput ingestion
  • Proven reliability at scale

The application uses SQLAlchemy with async drivers (aiosqlite / asyncpg), so switching between SQLite and PostgreSQL requires only changing the DATABASE_URL environment variable. Alembic migrations are database-agnostic.