Seeing Inside the AI’s Mind

A correct answer is not proof of understanding. AIDA measures what language models actually know — and what they’re guessing.

We are the first to reconstruct and diagnose the internal trajectory of every inference across the full depth of a transformer. The result: a measured, geometric picture of whether a model reasons cleanly or arrives at the right answer by chance. This is epistemic measurement — the science of what is known versus what is merely outputted — and it changes everything about how AI systems are evaluated, certified, and governed.

The Instruments Read the Paper

The Epistemic Gap

A model that scores 79.8% on medical licensing questions has just 38.6% structural correctness. Over half of its correct answers are epistemically hollow.

79.8% Reported Accuracy

38.6% Structural Correctness

41.2 pp Epistemic Gap

449 / 869 Hollow Correct Answers

Ministral-14B Instruct · 1,089 medical licensing questions · 45,738 layer probes · AIDA v1 protocol

The Problem

What benchmarks tell you

A model answered 79.8% of questions correctly. You cannot tell which answers are reliable. You cannot tell why it got the rest wrong. You have a number and nothing else.

What AIDA tells you

Of those correct answers, 420 arose from genuine structural knowledge and 449 from brittle internal processes that happened to land on the right token. We can identify which is which — for every question, at every layer.

Six Ways a Model Can Answer

Every inference is classified into one of six epistemic regimes based on dual geometric and logit analysis of the full layer stack.

Differentiated Correct

Genuine structural knowledge. Distinct representations built across layers. Safe for deployment.

Late Crystallisation

Correct but shallow. Answer emerges only in final layers. Sensitive to prompt variation.

Differentiated Wrong

Confident and structurally consistent — but converged on the wrong answer.

Correct Overridden

The model had the right answer at intermediate layers. Later processing suppressed it.

Fused Wrong

No differentiation between options at any layer. The model cannot distinguish alternatives.

Fused Gold

No differentiation, but the output is correct. Right by chance. Inflates benchmarks.

Natural Constants

We have identified threshold values that arise from the geometry of ensemble correlation structures — not from tuning or optimisation. These constants predict correctness with 96–97% reliability across independent medical benchmarks. They are discovered, not chosen. They are structural features of the epistemic manifold itself.