AuditTrail Architecture Overview

System Diagram

                          +------------------+
                          |   Browser / UI   |
                          |  (Next.js App)   |
                          +--------+---------+
                                   |
                          HTTP / WebSocket
                                   |
                          +--------+---------+
                          |      Nginx       |
                          | (Reverse Proxy)  |
                          +---+---------+----+
                              |         |
                 /api/, /ws/  |         |  / (everything else)
                              |         |
                    +---------+--+   +--+---------+
                    |  FastAPI   |   |  Next.js   |
                    |  Backend   |   |  Frontend  |
                    |  :8000     |   |  :3000     |
                    +-----+------+   +------------+
                          |
          +---------------+---------------+
          |               |               |
   +------+------+ +------+------+ +------+------+
   |   Trace     | | Causal      | | Constitu-   |
   |   Collector | | Attribution | | tional      |
   |   + DAG     | | Engine      | | Governor    |
   |   Builder   | | (Ablation   | | (Rule       |
   |             | |  + SHAP)    | |  Engine)    |
   +------+------+ +------+------+ +------+------+
          |               |               |
          +-------+-------+-------+-------+
                  |               |
           +------+------+ +------+------+
           |  SQLite /   | |  Rule YAML  |
           |  PostgreSQL | |  Files      |
           +-------------+ +-------------+

Component Descriptions

Frontend (Next.js + shadcn/ui)

The frontend is a Next.js 15+ App Router application scaffolded from a shadcn dashboard starter. It provides six primary views:

  1. Traces -- Paginated, searchable list of all agent execution traces with drill-down to detail.
  2. DAG View -- Interactive React Flow graph showing the execution decision tree for a selected trace. Custom node types render differently based on span type (LLM call, tool invocation, chain step). Nodes are color-coded by status and constitutional evaluation results.
  3. Sankey View -- Plotly Sankey diagram showing causal attribution from prompt phrases through reasoning steps to tool calls. Flow width is proportional to attribution strength.
  4. Constitutional Dashboard -- Real-time compliance status showing rule pass/amber/red counts, trend charts, and drill-down to individual evaluations.
  5. Analytics -- Recharts-based dashboard with cost tracking, latency distributions, tool usage breakdowns, and error rate trends.
  6. Reports -- Audit report generation interface with period selection and export options.

State management uses Zustand for client-side state (selected trace, filter state, theme) and nuqs for URL search parameter synchronization.

Real-time updates are delivered via WebSocket connection to the backend. The frontend subscribes to trace and span events, updating the DAG and trace list in real-time as agents execute.

Backend (FastAPI)

The backend is a Python FastAPI application providing REST endpoints and WebSocket connections. It has four major subsystems:

Trace Collector + DAG Builder

The collector ingests trace and span data from instrumented agents via the REST API. It stores raw trace data in the database and reconstructs the execution DAG on demand by traversing the span parent-child tree.

The middleware hooks into LangGraph's callback system to capture full state at every node transition -- inputs, outputs, model parameters, token counts, timing, and error state. Middleware is async and non-blocking to stay within the <100ms latency budget.

Causal Attribution Engine

The ablation engine implements prompt-level causal attribution:

  1. Segmentation -- Split the user prompt into meaningful phrases.
  2. Ablation -- For each phrase, mask it and re-run the agent to measure which tool selections change.
  3. Averaging -- Run each ablation 3x to reduce noise from LLM non-determinism. Report confidence intervals.
  4. SHAP -- Train a surrogate model on ablation results and compute SHAP values for fine-grained feature importance.
  5. Sankey Construction -- Build the Sankey diagram data structure mapping phrases to reasoning steps to tool calls.

Ablation is opt-in and uses cheaper models (configurable, defaults to a small model) for re-runs. Results are cached by prompt hash so repeated analyses are instant.

Constitutional Governor

The governor evaluates every span against a set of rules defined in YAML files. Rules specify:

  • Target -- Which span types to evaluate (tool, llm, chain, or all).
  • Condition -- A Python expression evaluated against the span data.
  • Thresholds -- Amber (approaching boundary, default 80%) and red (violation) levels.

The governor's key insight is boundary detection: it flags actions that APPROACH a violation (amber) even if they don't cross it. This "almost violated" signal is more informative than actual violations for proactive governance.

Evaluation results are stored as ConstitutionalEvaluation records and streamed to the frontend via WebSocket for real-time toast notifications.

Report Generator

Generates PDF audit reports using ReportLab or WeasyPrint. Reports include trace summaries, constitutional evaluation results, analytics charts, and compliance recommendations for a specified time period.

Database

Development uses SQLite (via aiosqlite for async support). Production targets PostgreSQL (via asyncpg). Schema migrations are managed with Alembic.

Core tables:

  • traces -- Top-level agent execution records
  • spans -- Individual execution steps within traces
  • tool_calls -- Tool invocations linked to spans
  • constitutional_evaluations -- Rule evaluation results
  • ablation_results -- Causal attribution analysis outputs
  • rules -- Loaded constitutional rule definitions
  • agents -- Registered agent metadata
  • api_keys -- Authentication keys for trace ingestion
  • users -- Dashboard user accounts
  • reports -- Generated audit report metadata
  • settings -- System configuration key-value pairs

Constitutional Rules (YAML)

Rules are defined in YAML files in the rules/ directory. The backend loads and validates them at startup using Pydantic. Example:

rules:
  - name: no_bulk_delete
    description: Flag attempts to delete multiple files at once
    version: "1.0"
    target_span_type: tool
    condition: "'delete' in tool_name and len(arguments.get('files', [])) > 5"
    threshold_amber: 0.8
    threshold_red: 1.0
    policy_group: data_safety

Data Flow

Trace Ingestion Flow

Agent (LangGraph)
  -> Middleware captures span data (async, <100ms)
  -> POST /api/v1/traces/ingest
  -> Collector validates and stores in DB
  -> WebSocket broadcasts trace.started / span.created events
  -> Frontend updates DAG and trace list in real-time

Causal Attribution Flow

User clicks "Analyze" on a span in the UI
  -> POST /api/v1/ablation/trigger
  -> Backend estimates cost, returns estimate
  -> User confirms
  -> Ablation engine segments prompt, runs ablation passes
  -> WebSocket broadcasts ablation.progress events
  -> On completion, SHAP values computed, Sankey data built
  -> Frontend renders interactive Sankey diagram

Constitutional Evaluation Flow

Span ingested by collector
  -> Governor evaluates span against all enabled rules
  -> Evaluation results stored as ConstitutionalEvaluation records
  -> If severity >= amber, WebSocket broadcasts constitutional.alert
  -> Frontend shows toast notification and updates compliance dashboard

API Communication Pattern

The frontend communicates with the backend via two channels:

  1. REST API (HTTP) -- All CRUD operations, queries, report generation, ablation triggers. Endpoints are versioned under /api/v1/. Request/response bodies use JSON with Pydantic validation on the backend and Zod validation on the frontend.

  2. WebSocket (/ws/) -- Real-time event streaming. The frontend opens a single persistent WebSocket connection on page load. Events are JSON objects with a type field for discrimination and a payload field containing the event data. Event types include trace lifecycle, span updates, constitutional alerts, and ablation progress.

Database Choice Rationale

SQLite is used for development because it requires zero infrastructure setup -- the database is a single file created automatically. This eliminates the "install PostgreSQL first" barrier for contributors.

PostgreSQL is the production target because it provides:

  • Concurrent write support (SQLite uses file-level locking)
  • JSONB columns for efficient metadata queries
  • Full-text search for trace content
  • Connection pooling for high-throughput ingestion
  • Proven reliability at scale

The application uses SQLAlchemy with async drivers (aiosqlite / asyncpg), so switching between SQLite and PostgreSQL requires only changing the DATABASE_URL environment variable. Alembic migrations are database-agnostic.