Open Hallucination IndexOpen Source

How much should you trust
that answer?

OHI decomposes AI-generated text into atomic claims and assigns each one a calibrated probability of being true — with an explicit uncertainty interval. No black-box confidence. No silent failures.

The Problem

LLM confidence is theater. Calibration isn't.

Ask an LLM how sure it is and it'll say 95%. Ask again — still 95%. Ask after it's wrong — still 95%. Self-reported confidence is uncorrelated with truth. OHI's 0.85 [0.78, 0.91] at 90% coverage is a guarantee, not a vibe.

0%Avg. hallucination rate

0%Detection accuracy

0msVerification latency

Analyzing Claim...

The Eiffel Tower was built in 1920

0%Trust Score100%

Architecture

The v2 verification pipeline

Each layer is independently scored and cached. Degradation is surfaced per-claim via fallback_used.

Features

Not another confidence score.

Calibrated probabilities, a probabilistic claim graph, and a public audit trail.

🎯

Calibrated probabilities

Not black-box confidence. Per-domain split conformal prediction gives you intervals with empirical coverage you can audit — 0.85 [0.78, 0.91] at 90% target means the guarantee, not the vibe.

🕸️

Probabilistic Claim Graph

Entailment and contradiction edges between claims propagate evidence through a loopy graph (TRW-BP). A refuted claim drags its dependencies. A contradiction pair can't both be 0.9.

🌅

Open, auditable, rest-respecting

Daily calibration report is public. Methodology lives in a single open spec. When the PC is off, we say so — not 'temporarily unavailable'.

Make hallucinations measurable.

Try it now. No account. No waitlist. Paste text, get a calibrated verdict with per-claim intervals and the full evidence graph.

How much should you trust that answer?

LLM confidence is theater. Calibration isn't.

The v2 verification pipeline

Not another confidence score.

Make hallucinations measurable.

How much should you trust
that answer?