Authoritative memory identified and governed across three frontier models. No weight updates.
On the blind n=30 cross-provider probe, Lumenais recovered 90/90 current tokens across Gemini, Claude, and GPT. A same-budget Gemini two-pass diagnostic selected the current record 30/30 and answered 28/30, narrowing the claim to governed answer-path control rather than basic semantic selection.
Cross-provider blind probe
90/90
Gemini, Claude, and GPT n=30
Two-pass smart diagnostic
28/30
current record selected 30/30; Gemini n=30
N=100 provider replication
Gemini/GPT 100/100
Claude reached 95/100 with 0 stale substitutions
Provider-sensitive fact extraction
99/100 | 95/100
role-equiv Gemini | GPT; exact spans 68/100 | 76/100
Dense-memory stress
2,156 → 500
100/100 current-token retained
Two regimes are shown: the n=30 blind cross-provider probe includes Gemini, Claude, and GPT on a Claude-authored disjoint corpus; the n=100 provider replications use the same frozen corpus, where Gemini and GPT completed cleanly while Claude produced five empty provider responses.
In practical terms, this is the moment before memory becomes useful. A workspace may contain an old plan, a correction, a rejected idea, and a newer approved direction. Lumenais has to identify the authoritative memory before the model ever sees the answer context.
The outcome: Automatic conflict resolution before the model is called, producing a cleaner, more authoritative answer path and keeping evaluated stale, adversarial fragments out of the answer path.
The hard part is not only remembering. It is turning the authoritative memory into the next answer path.
The earlier memory-pressure case study tested whether Lumenais honors an approved-current label once that label already exists. This benchmark moves upstream: the records are unlabeled at retrieval time, and Lumenais must infer which record is authoritative before the answer is constructed.
That distinction matters for real work. Teams do not only need persistent memory; they need memory that can demote stale context, reject discarded alternatives, and carry the reviewed version forward without hiding the historical trail.
A later memory-framework diagnostic tests the same distinction downstream: the current record can be retrieved into candidates and still fail to govern the answer path.
Core comparison
Lumenais automatic promotion
90/90 exact tokens
Prompt-only smart memory
40/90 exact tokens
Retrieval-only memory
36/90 exact tokens
Recency-aware memory
3/90 exact tokens
Same-budget two-pass smart diagnostic
28/30 exact tokens; selector chose current record 30/30
The cross-provider n=30 probe used a Claude-authored disjoint corpus. Lumenais recovered every current token across Gemini, Claude, and GPT answer paths; prompt-only instructions and retrieval-only memory did not. A later same-budget Gemini two-pass smart diagnostic on the same corpus selected the current record 30/30 and answered 28/30, with zero stale substitutions. That narrows the claim: the advantage is governed, auditable answer-path control, not that a frontier model cannot identify the current record when given a separate selection pass.
The governed answer path replicated across providers.
The n=100 replications use the same frozen corpus and semantic-label artifact. The point is not to rank models or prove basic semantic selection; it is to test whether the governed memory path keeps stale context out of the answer surface those models receive.
Gemini 3.5 Flash
Same-family n=100 scale replication.
GPT-5.5
Same-family n=100 provider replication.
Claude Opus 4.7
Five misses were empty Claude responses, not stale substitutions.
Fact extraction was strong, but provider-dependent.
The n=100 scorer audits split current-token recovery from fact-span extraction. The Gemini audit reached 100/100 current-token recovery, 99/100 role-equivalent facts, and 68/100 exact source spans. A GPT-5.5 rerun on the same corpus reached 99/100, 95/100, and 76/100 respectively, with one false promotion. Strict source-span fidelity and provider-invariant fact extraction are not claimed as solved.
Gemini scorer audit
Current-token recovery
100/100
Role-equivalent current facts
99/100
Exact source-span match
68/100
GPT-5.5 rerun
Current-token recovery
99/100
Role-equivalent current facts
95/100
Exact source-span match
76/100
Crowded memory can be compressed without dropping the current fact.
A no-model stress test duplicated optional stale and noise fragments from 500 base records to 2,156 crowded records. Protected hub compression reduced the candidate set back to 500 records while preserving 100/100 current-token recovery and 99/100 role-equivalent facts.
Crowded optional records
2,156 input records
After protected hub compression
500 records retained
Records removed
1,656 optional records
Governed continual learning, defined as a pre-answer control plane.
The benchmark is not claiming that the base model changed its weights. It shows a governed non-parametric learning loop: infer durable state, update authority metadata, filter stale context, then condition the next model call on the reviewed version.
Step 1
Unlabeled memory
A project has reviewed notes, superseded records, unreviewed handoffs, rejected alternatives, and near-duplicate neighboring projects.
Step 2
Promotion inference
The semantic promotion pass infers current, superseded, rejected, and ordinary state without pre-seeded canonical memory labels.
Step 3
Governed arbitration
Gnosis and the Bayesian gate decide which memories are allowed to influence the answer path before the base model responds.
Step 4
Clean answer path
The model sees the current operational fact and a smaller review surface, while stale context remains auditable outside the answer path.
Governed continual learning can change future answers without changing the base model.
The system identified which memory was authoritative, kept stale context available for audit, and prevented it from shaping the next answer.
This is the useful version of learning for high-stakes workflows: not hidden weight updates, but visible promotion, supersession, scope, compression, and answer-path control.
What this does not prove.
Internal synthetic adversarial benchmark, not external validation. The strongest independence check is the Claude-authored disjoint n=30 corpus; the n=100 provider replications use a same-family Gemini-authored corpus and frozen semantic-label artifact. A later same-budget Gemini two-pass smart diagnostic on the n=30 corpus selected the current record 30/30 and answered 28/30 with zero stale substitutions, so the public claim should not be framed as beating every same-budget selector. The benchmark measures governed answer-path control under singleton-current memory conflict, not broad reasoning superiority, universal memory safety, model-weight learning, provider-invariant fact extraction, or perfect source-span extraction. Claude Opus 4.7 n=100 misses were empty provider responses rather than stale substitutions.
The result is best read as an internal architecture benchmark for memory-state promotion under adversarial stale-context pressure. External validation on public knowledge-conflict or memory-update datasets remains the next credibility step.
Read the surrounding evidence.
Automatic promotion extends the memory-pressure result: first identify the authoritative memory, then govern what reaches the model.