Research — AIDA Research

AIDA™ — The Framework

Adaptive Inference Decision Architecture (Patent Filed 10 March 2026). The overarching assessment framework. AIDA reconstructs the layer-wise trajectory of internal representations across the full transformer depth, classifying each model–question pair into one of six epistemic regimes using dual geometric and logit views. A single model assessment produces approximately 750,000 analysis records — auditable, certificate-grade diagnostics from an operational system, not a research prototype.

Read the foundational paper →

The Instrument Suite

ASCOL

Augmented Structured Cognition through Observational Lensing. The per-sample diagnostic instrument. ASCOL measures the structural integrity of a model’s knowledge representation through 17 templates plus MCQ — 18 access paths per question. It distinguishes fused knowledge (robust, perturbation-resistant) from rote knowledge (brittle, surface-pattern-dependent), even when both produce identical correct answers on standard benchmarks.

FEST

Factual Elimination Stress Test. A perturbation protocol that systematically removes and recombines answer options across nine configurations (F00–F09), forming a sequential dependency chain. FEST measures how dependent a model’s correct answers are on distractor context. It exposes systematic fragility invisible to aggregate benchmarks: accuracy on the same 1,089 questions varies by up to 30 percentage points depending on which options are present.

ETC

Epistemic Trajectory Classifier. Two-view classification (geometry + logit) across six regimes. ETC traces the evolution of the model’s internal representations at every transformer layer, measuring cosine similarity structures, entropy profiles, margin dynamics, and rank stability. The output: a categorical assignment — Differentiated Correct, Late Crystallisation, Differentiated Wrong, Correct Overridden, Fused Wrong, or Fused Gold.

ELVA

Ensemble Logical Voting Analysis. Structural eliminative governance across model ensembles. ELVA identifies which model exhibits clean reasoning on each question, resolves disagreements through geometric analysis rather than majority voting, and performs deterministic structural elimination of unsafe outputs.

REGENT

Regime-Gated Adaptive Training. The first training system that uses the epistemic manifold to generate prescriptive, regime-specific training interventions. Per-sample freeze maps determine which parameters need updating and which must be preserved. In ensemble mode (REGENT-E), clean-process models act as epistemic teachers, producing the Boosted Training Dataset and the Generalised Epistemic Training Map — a portable, auditable, architecture-agnostic training specification. Potential fine-tuning cost savings: up to 55%.

First Glimpses Inside the Mind

When we first opened the internal states of a training model, we had no idea what we would find. What emerged was not noise but structure — phase transitions, harmonic oscillations, convergence hierarchies, and synchronised instability events that no existing framework had predicted or described. These are some of the first images from that archaeological journey.

Run 326 (Cardiology): Convergence by component showing clear S-curve phase transition

The moment a model learns. Run 326 (Cardiology specialisation): convergence progress by attention component. The S-curve at batch 100 marks a clean phase transition — q_proj and k_proj climb from single digits to 70%+ stability while v_proj and o_proj are deliberately suppressed. This is not gradient descent in the dark. It is visible, measurable learning.

Run 259: Harmonic Factor analysis with hornet spike detection

Harmonic structure in training dynamics. Run 259: the Harmonic Factor (HF) tracks oscillatory behaviour across all attention components. Red dots mark individual layer–component pairs breaching the 0.65 hornet threshold. The lower panel shows hornet counts — synchronised instability spikes at B420, B1400, and B1460 that correlate with curriculum transitions. The model’s internal dynamics are not random. They are structured, periodic, and measurable.

Run 259: Sum of convergence by layer zone showing late-layer dominance

Where knowledge lives. Run 259: sum of pct_stable_ema by layer zone. Late layers (green) carry the vast majority of convergence signal. Mid layers (orange) follow at roughly half the magnitude. Early layers (blue) remain near-flat throughout — a clear hierarchy showing that knowledge consolidation is concentrated in the deeper transformer layers, exactly where the epistemic manifold predicts it should be.

Run 259: Three-panel composite showing convergence, harmonic factor, and hornet events

The full picture. Run 259: convergence, harmonic oscillation, and instability events in a single view. Top panel: q_proj and k_proj convergence climbing through oscillation. Middle panel: Harmonic Factor hovering near the 0.65 threshold. Bottom panel: hornet spikes at curriculum-driven transition points. The key discovery: the model was learning 3–5× faster during apparent “chaos” than during the stable phase that followed. Oscillation is not failure. It is the signature of rapid structural reorganisation.

What the Instruments Reveal

Instruction Tuning Inflates

Instruction tuning added 7.3 pp of accuracy but just 0.3 pp of structural correctness. The epistemic gap widened by 6.9 pp. Almost every additional correct answer is epistemically hollow.

Fusion Is Categorical

0% of Differentiated Correct samples show fusion. 100% of Late Crystallisation samples do. This is not a statistical tendency — it is a categorical boundary in the manifold.

Natural Constants Exist

Threshold values arising from manifold geometry predict correctness with 96–97% reliability across independent benchmarks. Same value, different datasets. Discovered, not tuned.

LoRA ≠ Full Fine-Tuning

At identical accuracy, LoRA produces predominantly fused knowledge. Full fine-tuning produces predominantly rote knowledge. A difference in kind, invisible to accuracy.

Models Know More Than They Show

Correct Overridden trajectories prove the model possessed the right answer at intermediate layers — then suppressed it. Knowledge exists but is inaccessible through the standard path.

Accuracy Is Format-Dependent

FEST shows a 20–30 pp accuracy range on the same questions depending on which options are present. Aggregate accuracy is not intrinsic to the model — it is an artefact of the test format.

The Instruments