The Problem
LLM confidence is theater. Calibration isn't.
Ask an LLM how sure it is and it'll say 95%. Ask again — still 95%. Ask after it's wrong — still 95%. Self-reported confidence is uncorrelated with truth. OHI's 0.85 [0.78, 0.91] at 90% coverage is a guarantee, not a vibe.
0%Avg. hallucination rate
0%Detection accuracy
0msVerification latency
Analyzing Claim...
The Eiffel Tower was built in 1920
0%Trust Score100%
Architecture
The v2 verification pipeline
Each layer is independently scored and cached. Degradation is surfaced per-claim via fallback_used.
Features
Not another confidence score.
Calibrated probabilities, a probabilistic claim graph, and a public audit trail.
🎯
Calibrated probabilities
Not black-box confidence. Per-domain split conformal prediction gives you intervals with empirical coverage you can audit — 0.85 [0.78, 0.91] at 90% target means the guarantee, not the vibe.
🕸️
Probabilistic Claim Graph
Entailment and contradiction edges between claims propagate evidence through a loopy graph (TRW-BP). A refuted claim drags its dependencies. A contradiction pair can't both be 0.9.
🌅
Open, auditable, rest-respecting
Daily calibration report is public. Methodology lives in a single open spec. When the PC is off, we say so — not 'temporarily unavailable'.