AuditTrail

Don't just trace what happened.
Understand why.

Open-source explainable observability for AI agent systems. Causal attribution + constitutional governance + visual debugging.

57%
of orgs have AI agents in production
45 → 5 min
debugging time reduction
<5 ms
middleware overhead
Aug 2026
EU AI Act enforcement deadline

Integrated with the frameworks you already use

LangChainLangGraphOpenAIAnthropicOpenTelemetryLangfuseAutoGenCrewAIDSPyLlamaIndexHaystackSemantic KernelLangChainLangGraphOpenAIAnthropicOpenTelemetryLangfuseAutoGenCrewAIDSPyLlamaIndexHaystackSemantic Kernel
OpenTelemetryLangfuseAutoGenCrewAIDSPyLlamaIndexHaystackSemantic KernelLangChainLangGraphOpenAIAnthropicOpenTelemetryLangfuseAutoGenCrewAIDSPyLlamaIndexHaystackSemantic KernelLangChainLangGraphOpenAIAnthropic
DSPyLlamaIndexHaystackSemantic KernelLangChainLangGraphOpenAIAnthropicOpenTelemetryLangfuseAutoGenCrewAIDSPyLlamaIndexHaystackSemantic KernelLangChainLangGraphOpenAIAnthropicOpenTelemetryLangfuseAutoGenCrewAI

The gap · Two lenses · One platform

“What happened” is easy. “Why” is the bar.

Same trace data, two perspectives. Hover either lens to focus it; the components below are the real /traces and /sankey views, not screenshots.

WHAT

Traces tell you what the agent did.

Ordered chain of LLM + tool calls with timing, cost, and tokens. The diary of the run — the same story every observability tool is happy to tell.

Shown in: /traces list, flame graphs

WHY

AuditTrail explains why it did it.

Causal attribution over the prompt. Every decision has a trail back to the words that drove it, the rule that caught it, and the edit that would have flipped it.

Shown in: /sankey, /traces/[id] counterfactual panel

Side-by-side

Debugging, before and after.

Same failure mode. Same agent. Drag to compare the investigation loop without AuditTrail (left) vs. with it (right).

[ERROR] 2026-04-19T14:22:08Z customer-support.agent step=synth
[INFO ] parent_run_id=a3f2… span_id=b1… tokens_in=1284
[INFO ] child_run a3f2…b2 type=tool name=search
[INFO ] child_run a3f2…b3 type=tool name=calculator
[WARN ] confidence=0.42 below_threshold=0.6 skip_chain
[INFO ] child_run a3f2…b4 type=llm name=synth model=gpt-4o
[INFO ] retries: 0 1 2 3 …
Exception: confidence_gate_failed — agent bailed after 3 retries
[DEBUG] full trace 6.2 MB — grep if you can
3 terminals open · you're diffing JSON by eye
without AuditTrail
Trace b1c8… · customer-support · sankeywarning · 1 rule breach
Prompt PhrasesReasoningTool Calls
"Calculate the ROI"
attribution
0.12
"of our marketing"
attribution
0.45
"campaign using"
attribution
0.28
"the latest data"
attribution
0.15
Intent classify
Information retrieval
0.57
Tool match
Keyword→corpus
0.43
web_search
0.55 attribution
selected
0.55
calculator
0.22 attribution
expected
0.22
read_file
0.06 low conf
low conf
0.06
with AuditTrail · 5 min

Drag the slider to compare. Mobile: swipe horizontally.

Compare platforms.

Feature-by-feature against the OSS + SaaS LLM observability pack. Uncertain cells are shown as partial; every claim is cross-checked against published docs. Hover any cell icon for footnotes.

AuditTrail 0/20
LangSmith 0/20
Langfuse 0/20
OPIK 0/20
Helicone 0/20
FeatureAuditTrailLangSmithLangfuseOPIKHelicone
Observability4 features· pinned
Trace capture
Interactive DAG viewer
Sankey flow attribution
Real-time streaming
Explainability (XAI)4 features· pinned
Causal attribution (SHAP + ablation)
Counterfactual explanations
Mechanistic XAI (SAE features)
Natural language explanations
Governance4 features· pinned
Constitutional rule engine
EU AI Act compliance mode
REGO / OPA policy engine
PDF audit export
Operations4 features· pinned
Live Fleet dashboard
Operations Assistant chatbot
3-tier deployment actions
OpenAI-compatible gateway
Infrastructure4 features· pinned
Self-hostable
Open-source license
6-language first-party SDK
OTel OTLP ingest

supported partial TBD — research pending not supported· hover a group to peek · click to pin · hover any row or footnote icon for details

The full tour.

Every surface the product ships — observe, explain, operate, govern, integrate. Screenshots captured against the live dashboard.

See every agent, every span, live.
auditrail.imaginaerium.in/overview
Dashboard Overview — Observe
01/04 Dashboard Overview

Mechanistic XAI · v2.0+

See the concepts the model was actually thinking about.

When your agent runs on a supported open-source model we attach a sparse autoencoder trained by SAELens and surface the top-activated features per span. Behavioural features, safety features, chain-of-thought cues — the interpretable units modern mech-interp research has learned to find.

  • Works with Llama 3.x, Gemma 2/3, Mistral Small.
  • API models (GPT, Claude) fall back to ablation + counterfactuals.
  • Feature labels come from Neuronpedia dictionaries when available.
SAE · layer 15 · top-7
Preview
  • f_15_4109 · refusal / policy-adjacent0.92

    Active near SYSTEM block discussing allowed topics.

  • f_15_2774 · user wants calculation0.78

    Fires on the literal numeric tokens in the user turn.

  • f_15_9210 · recency / time-bounded query0.63

    Fires on phrases like "today" and "latest".

  • f_15_1044 · chain-of-thought cue0.55

    Rises inside `<thinking>` style wrappers.

  • f_15_7702 · JSON output scaffolding0.47

    Precedes tool-argument emission.

  • f_15_3388 · cite / attribute source0.34

    Fires after retrieved-context block.

  • f_15_5821 · instructions override0.22

    Partial — the model is weighing a user-prompt nudge.

This is a visual preview. Live activations appear in /traces/[id] → SAE tab when the caller's model is a supported open-source one and AUDITTRAIL_SAE_MODEL_KEY is configured.

Pricing

Run it where it lives best.

Apache 2.0 self-host has every feature we ship. Cloud saves your ops team the DB + retention + scaling work. Enterprise layers SSO, SCIM, and private-deploy ceremony on top.

Self-host

Your infra, your rules. Apache 2.0.

Free

Forever. No seat limit, no feature gates.

  • Docker Compose or Helm install
  • Unlimited traces + spans, ad-hoc retention
  • All evaluators, governor rules, Sankey/DAG views
  • Community support via GitHub Discussions
Most teams start here

Cloud

Hosted by us. You bring the agents.

$49

per org / month · billed annually · 100k traces / mo

  • Everything in self-host
  • Managed Postgres + retention up to 180 days
  • Gateway proxy + BYOK provider key pool
  • Email + Slack support, 1-business-day SLO

Enterprise

SSO, SCIM, SOC 2 docs, private deploy.

Talk to us

Custom terms · multi-year available

  • Everything in Cloud
  • SAML 2.0 SSO + SCIM v2 provisioning
  • Private VPC deploy, region pinning, DPA on file
  • Dedicated CSM, 1-hour SLO, security review package

Full feature matrix + annual-prepay discount on /pricing.

Ship agents that explain themselves.Self-host or cloud. Your call.

agent.py
import audittrail

# Initialize — one line, zero config
audittrail.init(frameworks=["langgraph"])

@audittrail.traceable
async def run_agent(prompt: str):
    result = await graph.ainvoke({"input": prompt})
    return result

# Full traces, DAG, Sankey — automatic
Terminal
$ docker compose up -d
$ open http://localhost:3000
# Dashboard ready. Start tracing.

1. Install the SDK

pip install audittrail — works with LangGraph, LangChain, AutoGen, OpenAI Agents SDK, and the raw OpenAI / Anthropic SDKs.

2. Launch the dashboard

One Docker command. Full observability UI on localhost:3000. No cloud, no config, no call-home.

install-get-started.sh

Instrument

# 2. Instrument — one decorator, zero config
from audittrail import traceable

@traceable(name="research-agent")
async def run(query: str) -> str:
    ...

Instrument. Spans flow automatically: LLM calls, tool invocations, outputs, token counts, costs.

python-quickstart.py
$ pip install audittrail
from audittrail import traceable

@traceable(name="research-agent")
async def deep_search(query: str) -> str:
    plan = await llm.complete(f"Plan for: {query}")
    docs = await search_tool(plan)
    return await llm.complete(f"Synthesize: {docs}")

One contract. Every language.

Three lines of code. Full explainability. No configuration.

Drop the SDK in, point traces at AuditTrail, and start observing — and operating — your fleet in minutes. Helm chart ships with the repo. Docker Compose for dev. Cloud tier coming soon. Either way — same data model, same SDKs, same dashboard.

View on GitHub
PythonTypeScriptGoJavaRust.NET
EU AI ActSOC 2ISO 42001GDPRApache 2.0

FAQ

Your operator questions, answered.

  • Yes — the full platform is Apache 2.0 and runs on a single Docker Compose node (or a k8s cluster via the Helm chart in /deploy). No feature gating. A managed cloud tier is coming in v3.0 for teams that don't want to run their own infra.