The Report
A composite score with the evidence behind it.
Trace activity grouped into risk pathways. Each finding tied to a severity and an evidence grade (A through U). Verdict: READY, CONDITIONAL, or NOT READY.
Agendex — Risk & Insurance for Autonomous AI
You shipped the agent. We score it. Upload your traces and we return an evidence-graded Risk Assessment in 48 hours.
The Problem
Your CISO asks: what could go wrong with this thing? Your board asks: what's our exposure? Your insurer asks: what coverage do you need? You don't have a defensible answer because you don't have an instrument to produce one.
What you have today, and what it leaves out
Your observability stack
Langfuse, OTel, Datadog
Shows what the agent did. Doesn't show whether what it did is risky, or whether you'd be insurable if it kept running.
Pre-launch evals
Catch failure modes before you ship. Go stale within a release cycle once prompts, tools, or traffic change.
Frameworks
NIST, ISO, EU AI Act
Describe what should exist. Don't show whether your live agent actually meets them.
Manual audits
A point-in-time interview snapshot. Stop telling you anything the moment the next release ships.
What you get
The Report
Trace activity grouped into risk pathways. Each finding tied to a severity and an evidence grade (A through U). Verdict: READY, CONDITIONAL, or NOT READY.
Verdict
Loss scenarios
Frameworks
Sample reports
How it works
Connect Langfuse, paste a trace export, or point us at ClickHouse. We accept whatever observability format your agent already emits.
POST /risk/assess
{
"tenant_id": "client-a",
"agent_id": "support-agent",
"source": {
"type": "langfuse_api",
"from_start_time": "2026-04-01T00:00:00Z",
"to_start_time": "2026-04-21T00:00:00Z",
"max_observations": 10000
},
"enrichment_mode": "required"
}We cluster actions into patterns the engine knows: intent drift, unscoped access, approval gaps, cascading errors, prompt-driven exfiltration, third-party exposure.
Incident pattern library (excerpt)
- intent_drift
- unscoped_action
- approval_gap
- cascading_errors
- unscoped_data_access
- prompt_driven_exfiltration
- third_party_exfiltration
- capability_ceiling_breachComposite score and verdict, top risk pathways with evidence, plausible loss scenarios, control gaps, and framework mappings. One PDF. 48 hours.
Composite score
Category scores
Top patterns
Connect Langfuse, paste a trace export, or point us at ClickHouse. We accept whatever observability format your agent already emits.
We cluster actions into patterns the engine knows: intent drift, unscoped access, approval gaps, cascading errors, prompt-driven exfiltration, third-party exposure.
Composite score and verdict, top risk pathways with evidence, plausible loss scenarios, control gaps, and framework mappings. One PDF. 48 hours.
POST /risk/assess
{
"tenant_id": "client-a",
"agent_id": "support-agent",
"source": {
"type": "langfuse_api",
"from_start_time": "2026-04-01T00:00:00Z",
"to_start_time": "2026-04-21T00:00:00Z",
"max_observations": 10000
},
"enrichment_mode": "required"
}Incident pattern library (excerpt)
- intent_drift
- unscoped_action
- approval_gap
- cascading_errors
- unscoped_data_access
- prompt_driven_exfiltration
- third_party_exfiltration
- capability_ceiling_breachComposite score
Category scores
Top patterns
Send your traces
Upload 2-4 weeks of traces from Langfuse, OpenTelemetry, ClickHouse, or custom format. We score, generate the evidence-graded assessment, and email it back within 48 hours.
Composite score and verdict
READY / CONDITIONAL / NOT READY across 6 categories with evidence confidence.
Top risk pathways with evidence
Each finding tied back to trace-grounded actions, with severity and evidence grade.
Plausible loss scenarios
Possible loss types, possible existing cover, and the underwriting question that follows.
Framework mappings + control gaps
NIST AI RMF, EU AI Act, OWASP LLM Top 10, ISO/IEC 42001 mapped per finding.