Brain Critical Analysis — From Cache to Cognition

Status: Analysis complete, implementation priorities defined Date: March 2, 2026 Prerequisites: fractal-brain.md, digital-brain-vision.md Context: Benchmark results from 150-problem experiment (3 phases × 50 problems)

Executive Summary
Design Principles — What We're Building and Why
Benchmark Reality Check
The Fundamental Misdiagnosis
What Nature Actually Does
The Revised Architecture — Brain as Attention System
Space = Substrat — Unifying the User Model and the Brain
The Dream Cycle — Compression Is Intelligence
Wild Ideas From Nature
Critical Assessment — Can We Actually Do This?
What Works, What Doesn't, What's Missing
Implementation Priorities
Open Questions
Metrics That Matter

Executive Summary

The Fractal Brain (Phases 1-6) built real infrastructure: NeuronTrees, structural matching, 3-mode recall, confidence tracking, dream consolidation, and an organism architecture. The Biological Learning Loop (bench-cli) added code-first storage, self-verification, dual-path verification, and dream replay.

After benchmarking 150 math problems across 3 phases, the honest results:

Metric	Result	Assessment
LLM call savings	~7%	Not meaningful
Recall accuracy (brain alone)	16.7% (1/6 correct)	Broken
Prompt augmentation accuracy	59.1% (26/44 correct)	This works
Overall accuracy improvement	22% → 54% on re-seen problems	Real, but LLM does the work
Code substitution	4 hits, 3 wrong (empty `{}`)	Fundamentally broken

The brain is working as a prompt augmentation engine — and that's genuinely valuable. But it's not working as a recall engine. The biological metaphor is ahead of the mechanism.

This document diagnoses why, proposes a revised architecture based on how biological brains actually work, and defines a critical path toward a brain that learns, compresses, and eventually runs autonomously.

Design Principles

What we're building

A brain that runs on-device, learns from experience, and gets smarter over time. Not a cloud service. Not a chatbot wrapper. A private, local intelligence that belongs to the user.

Priority order

Fast — the brain must add <100ms to response time, not seconds
Private — everything runs locally. No data leaves the device. Period.
Cost-saving — LLM inference (even local) has a compute cost. The brain should reduce it over time.
Learning — the brain gets better with experience. Day 100 is noticeably smarter than day 1.

The role of the LLM

The LLM is not the intelligence. It's a tool the brain uses — a "whispered advisor." A big cloud LLM knows everything but perfectly nothing. A small local model (1.5B-7B) is fast and private but limited. The brain's job is to compensate for the small model's weaknesses by:

Providing perfect context (so the small model performs like a big one)
Handling known problems without the model (0 inference cost)
Compressing experience into procedures the model never needs to re-derive

The brain doesn't replace the LLM. It makes each LLM call maximally effective, and over time, needs fewer of them.

What "learning from experience" actually means

A brain that has solved 50 GCD problems should:

Recognize a new GCD problem instantly (<10ms, no LLM)
Know the Euclidean algorithm works (from 50 verified experiences in the same substrat)
Execute it directly (parameterized code, or WASM for speed)
Know what doesn't work (anti-patterns from the substrat's failures)

This is not cache lookup. It's genuine competence — the difference between a student who memorized answers and one who understands the subject.

Benchmark Reality Check

The experiment

Phase	Problems	Condition	Accuracy
Phase 1 (TRAIN)	50 shuffled	+ dream consolidation	22%
Phase 2 (TEST)	50 new, shuffled	brain has 46 trees	24%
Phase 3 (RE-TEST)	50 same as P2	brain has 90 trees	54%

Where the accuracy actually comes from

Phase 3 breakdown (the "good" result):

Source	Problems	Correct	Accuracy	LLM calls
Brain recalled (Exact + Variation)	6	1	16.7%	0
LLM with brain-augmented prompt	44	26	59.1%	44+
Total	50	27	54%	~70

The brain's recall is essentially broken. 5 out of 6 recalls gave wrong answers. The 54% accuracy comes from the LLM getting better prompts (few-shot examples + anti-recall hints injected by the brain).

LLM call accounting

Scenario	Total LLM calls (150 problems)	Savings
No brain, single_shot	150	baseline
No brain, code_execution	~225 (retries + fallbacks)	-50% (worse!)
Brain + code_execution	~209	7% vs code_execution
Brain + code_execution vs single_shot	~209 vs 150	39% MORE calls

The code_execution strategy itself adds 50% more LLM calls. The brain saves back 7% of those. Net result: we're spending more, not less.

What actually worked

Prompt augmentation — injecting correct examples + anti-patterns into the LLM prompt. This is the brain's real contribution. 24% → 59% accuracy on problems where it provided context.
Code verification — 52/52 stored code verified deterministic. No broken code persists. The self-verification loop works.
Anti-recall — quarantining bad trees. 3 trees quarantined, preventing the brain from confidently replaying wrong answers.

What failed

Structural matching — the trajectory segmenter creates trees based on embedding direction changes, not semantic structure. "Find the GCD of 48 and 18" and "Find the GCD of 360 and 240" don't match as Exact (even though they're the same problem type) because the numbers change the embedding trajectory.
Code substitution — variation hits produce empty substitutions {}. The brain detects "these trees are similar" but can't identify which leaf neurons are parameters vs. operations. The code runs with original hardcoded values → wrong answer.
Code generation — the 1.5B model generates valid Python 38% of the time. Code-first strategy forces the model into its worst mode.
Recall rate — 5/50 recalls on problems the brain had already seen (10%). The other 45 were classified Novel again. The brain doesn't recognize problems it already solved.

The Fundamental Misdiagnosis

Both the fractal-brain.md and digital-brain-vision.md share an assumption:

"The brain's goal is to eliminate LLM calls by recalling stored answers."

This frames the brain as a cache. Cache hit → replay answer (0 LLM calls). Cache miss → full LLM call. The metric is hit rate.

This is wrong for three reasons:

1. Brains are not caches

A human who has solved 50 GCD problems doesn't recall "GCD(48,18) = 6." They recognize the category ("this is a GCD problem"), activate the relevant method ("use the Euclidean algorithm"), and apply it. The memory doesn't replace computation — it guides computation.

The benchmark data proves this: prompt augmentation (guiding the LLM) works at 59% accuracy. Direct recall works at 16.7%. The brain is already better at guiding than replaying.

2. Perfect recall of wrong answers is worse than no recall

4 out of 5 recalls were wrong. The brain confidently replayed incorrect code. This is worse than asking the LLM fresh — at least the LLM has a 24% chance of getting it right.

A cache that replays stale data is a liability. A brain that guides fresh computation is an asset.

3. The substitution problem is architectural, not a bug

Code substitution requires knowing what's a parameter and what's a method. gcd(48, 18) — the gcd is the method, 48 and 18 are parameters. But the trajectory segmenter doesn't know this. It segments by embedding direction changes, not semantic roles.

You can't fix this by lowering thresholds or improving fingerprints. The information isn't in the embeddings. It requires either:

The LLM to label roles at storage time ("this is a GCD problem with arguments 48 and 18")
Storing parameterized code (def solve(a, b): return gcd(a, b)) instead of scripts (print(gcd(48, 18)))
Both

What Nature Actually Does

The hippocampus stores POINTERS, not memories

Neuroscience shows the hippocampus stores sparse activation patterns — indices into distributed cortical representations. Not full memories. When replayed during sleep, these indices re-activate the relevant neocortical areas.

Current implementation: NeuronTrees store full solutions. Recall tries to replay.

Biological reality: Store a location in the brain's spatial map (substrat centroid proximity, method used, confidence). When recalled, this location activates relevant context that guides new computation.

The cerebellum stores PROCEDURES, not scripts

The cerebellum stores how to throw a ball — a parameterized motor program that adapts to different distances and weights. Not a recording of one specific throw.

Current implementation: Stored code has hardcoded values.

# This is a SCRIPT (one throw at one distance)
from math import gcd
print(gcd(48, 18))

Biological reality: Store a parameterized procedure.

# This is a PROCEDURE (throwing at any distance)
def solve(a: int, b: int) -> int:
    from math import gcd
    return gcd(a, b)

The prefrontal cortex is an ATTENTION CONTROLLER

The PFC doesn't execute solutions. It holds a few relevant items in working memory (Miller's 7±2) and orchestrates retrieval. It's a curator.

When facing a novel problem, the PFC:

Recognizes the broad category ("number theory")
Retrieves relevant methods ("Euclidean algorithm, prime factoring")
Selects the most promising one
Monitors execution and switches if it fails

This is routing with learned preferences, not pattern matching.

The amygdala tags WHAT MATTERS

The amygdala doesn't just say "good/bad." It creates emotional tags: urgency, novelty, social relevance, frustration. These tags determine what gets consolidated during sleep (important → keep, irrelevant → prune).

Current implementation: Binary reward (correct/incorrect).

Biological reality: Rich tagging — was this surprising? Was the user satisfied? Was this a new substrat? Did the brain predict correctly?

The thalamus FILTERS what reaches consciousness

90% of sensory input is filtered out before it reaches cortex. The thalamus gates what's relevant based on current context and goals.

Current implementation: Every query searches all stored trees equally.

Biological reality: Context-aware filtering. In a math conversation, only math neurons are active. In a cooking conversation, math neurons are dormant.

The Revised Architecture

Core shift: Brain as Attention System, not Cache

OLD: Query → match tree → replay stored answer (0 LLM calls)
                          or fallback to LLM (1+ LLM calls)

NEW: Query → recognize substrat → curate context → execute
     (perception)  (hippocampus)     (thalamus)     (cerebellum OR LLM with context)

The brain's primary job is making every LLM call maximally effective by providing perfect context. Its secondary job is eliminating LLM calls for truly procedural knowledge. The old architecture had these backwards.

Layer 1: Perception (keep as-is)

Query text
  → embed_tokens() → per-token hidden states
  → TrajectorySegmenter → NeuronTree
  → SparseFingerprint for fast candidate lookup

The structural decomposition is useful. The fingerprint index provides O(1) candidate retrieval. Keep it.

Layer 2: Recognition (REDESIGN)

Current: Compare query fingerprint against stored trees. Classify as Exact/Variation/Novel.

Proposed: Two-stage recognition.

Stage A — Substrat match (fast, <10ms)

Instead of discrete category labels, the brain organizes knowledge into substrats — continuous regions of embedding space defined by a centroid and a scope. This mirrors how biological place cells work: each fires maximally at one location (centroid) and decreases with distance (Gaussian falloff over scope). Multiple substrats overlap, creating a continuous map of problem-space.

A substrat is NOT a label. It's a spectrum — a center point with fuzzy boundaries. A problem about "geometric sequences" naturally lives partly in the "sequences" substrat and partly in the "geometry" substrat. No forced choice.

Experimental Validation

We validated this with a 100-problem experiment using the same all-MiniLM-L6-v2 model (384-dim, fastembed) that Morphee uses in production. 10 problem types, 10 problems each:

Metric	Result
Nearest-centroid classification	97/100 = 97%
GCD pair distance ("GCD of 48,18" vs "GCD of 360,240")	0.38 (close — same substrat)
Cross-type distance (Geometry vs Quadratic)	0.66 (far — different substrats)
Cross-domain distance (GCD vs Cooking)	1.00 (maximally distant)

The document's original concern — that "Find the GCD of 48 and 18" vs "Find the GCD of 360 and 240" wouldn't cluster — was wrong for sentence embeddings. They have distance 0.38, well within a substrat's zone. The trajectory segmenter (per-token) diverges on different numbers, but the sentence-level embedding is dominated by the shared semantic content.

Important nuances from the experiment:

Separation ratios are WEAK (1.0-1.4 for most math types). Types cluster correctly, but the clusters are large and close together. GCD and LCM centroids are only 0.33 apart while intra-type variance is 0.45-0.52.
GCD and LCM naturally overlap — they SHOULD be one substrat (both are "number theory: divisibility"). The brain would merge them, which is correct.
Cooking vs ANY math type = 0.77-0.91 distance. Cross-domain separation is excellent.
Only 3 confusions out of 100: one combinatorics→geometry, two euler_totient→{lcm, prime}. All were edge cases with unusual phrasing.

Implication for substrat design: Sentence embeddings work for substrat clustering. BUT the membership function must handle the weak inter-type separation within the same domain. A Gaussian falloff (not linear) is essential — see Membership Function section below.

Critical: use sentence-level embeddings, NOT per-token trajectories. The TrajectorySegmenter produces per-token hidden states (384-dim per token), not sentence embeddings. Substrat centroids must use the sentence embedding (mean pool of all token states, or CLS token). The production model is all-MiniLM-L6-v2 via fastembed (ONNX, 384-dim). This is already available in morphee-core/src/providers/embeddings.rs.

pub struct Substrat {
    id: SubstratId,

    /// Center of this knowledge region (384-dim sentence embedding)
    centroid: Vec<f32>,

    /// Scope — how broad this substrat covers in embedding space (Gaussian sigma)
    /// "Mathematics" has large scope, "GCD" has small scope.
    /// Starts with a bootstrap estimate, refined by dream cycles.
    scope: f32,

    /// Confidence — how reliable this substrat's knowledge is (0.0 to 1.0)
    /// Separate from scope. A new "Cooking" substrat has large scope + LOW confidence.
    /// A mature "Mathematics" substrat has large scope + HIGH confidence.
    confidence: f32,

    /// Temperature — recency-based activation warmth (0.0 cold to 1.0 hot)
    /// Warm substrats match more easily (effective scope expands).
    /// Enables conversational context routing.
    temperature: f32,
    last_activated: std::time::Instant,

    /// Neurons that belong to this substrat
    neuron_ids: Vec<NeuronId>,

    /// How many experiences shaped this substrat
    exemplar_count: u32,

    /// The generalized method, once born
    method: Option<MethodNeuron>,
}

pub struct SubstratIndex {
    substrats: Vec<Substrat>,
}

impl SubstratIndex {
    /// Find substrats for a query. Returns ALL substrats where the query
    /// falls within their confidence zone, with membership strength.
    /// A problem can belong to multiple substrats simultaneously.
    fn find_substrats(&self, query_embedding: &[f32]) -> Vec<SubstratMatch> {
        // For each substrat:
        //   distance = cosine_distance(query, centroid)
        //   effective_scope = scope * (1.0 + 0.3 * temperature)
        //   membership = exp(-distance² / (2 * effective_scope²))  // Gaussian
        // Return all with membership > threshold, sorted by strength
    }

    /// A new substrat is born when a problem is far from ALL existing centroids
    fn maybe_birth_substrat(&mut self, embedding: &[f32], neuron_id: NeuronId) -> Option<SubstratId>;

    /// Existing substrats update their centroid as new examples arrive
    /// (running average, weighted by success)
    fn update_centroid(&mut self, id: SubstratId, new_embedding: &[f32], success: bool);

    /// Decay temperature for all substrats (called periodically)
    fn decay_temperatures(&mut self, elapsed: Duration);
}

On-device cost: one cosine comparison against ~N substrat centroids. With 100 substrats: ~100 dot products on a 384-dim vector — microseconds. No LLM needed for recognition.

Membership Function: Gaussian, Not Linear

The experiment showed separation ratios of 1.0-1.4 for related math types. A linear decay (1 - d/r) gives 50% membership at half the scope, which means edge-of-scope queries get strong signal — pulling in noise from adjacent substrats.

A Gaussian falloff is more selective and matches how biological place cells fire:

Linear:    membership = max(0, 1.0 - distance / scope)        // too permissive
Gaussian:  membership = exp(-distance² / (2 * scope²))        // sharp center, soft edges

At distance = scope, Gaussian gives ~60% membership (one sigma). At distance = 2*scope, it gives ~13%. This means queries near the center get strong signal, edge cases get weak signal, and distant queries get near-zero — exactly what we need for closely-spaced math substrats.

Temperature: Conversational Context Routing

Substrats have a temperature that decays over time. Recently activated substrats are "warm" and easier to match:

fn effective_scope(&self) -> f32 {
    self.scope * (1.0 + 0.3 * self.temperature)
}

If Sophie is doing math homework and asks "what about this one?", the math substrat is warm (recently activated) and captures the ambiguous query. Without temperature, the brain starts from scratch every time.

Temperature decays exponentially: temp *= 0.95 per minute. After 10 minutes of inactivity, a substrat is effectively cold. This is cheap (one float, one timestamp per substrat) and solves conversational context for free.

Scope vs Confidence: Separate Concepts

A naive design would use a single "radius" for everything. But that conflates two things:

Scope — how broad the substrat covers in embedding space. "Mathematics" has large scope, "GCD" has small scope.
Confidence — how reliable the substrat's knowledge is. "Mathematics" can be high-confidence AND large-scope. A brand new "Cooking" substrat is large-scope (uncertain about boundaries) AND low-confidence (few experiences).

These MUST be separate fields. Behavior differs:

Large scope + high confidence: attracts queries confidently, provides strong exemplars
Large scope + low confidence: accepts queries tentatively, marks them as exploratory
Small scope + high confidence: narrow expert, very reliable within its domain
Small scope + low confidence: shouldn't happen (would have been pruned or merged)

How substrats form and evolve:

Birth — a query lands far from all centroids → new substrat born with scope=large (uncertain boundaries), confidence=low
Growth — more queries land nearby → centroid shifts toward the mean, scope adjusts to cover members
Maturation — after 10+ examples, the substrat has a stable center. Scope reflects actual coverage. Confidence grows with success rate.
Splitting — if a substrat's members become bicoherent (two clusters), it splits (mitosis)
Merging — if two substrats overlap >80%, they merge into one

This is neurogenesis — the brain literally grows new regions as it encounters new domains.

Stage B — Structural match within substrat (when needed)

Only compare fingerprints against trees in the matched substrat(s). Problems that live in multiple substrats get neurons from all of them (proportional to membership strength). This dramatically reduces false matches while preserving cross-domain knowledge.

pub struct SubstratMatch {
    substrat_id: SubstratId,
    membership: f32,  // 0.0 to 1.0 — how strongly this query belongs here
}

pub enum RecognitionResult {
    /// Known substrat(s), confident method → curate context + execute
    Recognized {
        primary: SubstratMatch,
        secondary: Vec<SubstratMatch>,  // other substrats that contribute
        method_neuron: Option<NeuronId>,
        confidence: f32,
    },
    /// Partially known — on the edge of a substrat's confidence zone
    Familiar {
        substrats: Vec<SubstratMatch>,
        related_neurons: Vec<NeuronId>,
        confidence: f32,
    },
    /// Completely novel — far from all substrats → new substrat born
    Novel,
}

Layer 3: Working Memory / Attention (NEW — the real brain)

Once the substrat(s) are recognized, the thalamus curates what the executor needs:

pub struct WorkingMemory {
    /// Primary substrat and membership strength
    primary_substrat: SubstratMatch,
    confidence: f32,  // "I've solved 47/50 of these correctly"

    /// Best 3 solved examples from this substrat (few-shot exemplars)
    exemplars: Vec<StoredExample>,

    /// Known wrong approaches (anti-patterns)
    anti_patterns: Vec<AntiPattern>,

    /// The generalized method, if one exists
    method: Option<MethodNeuron>,

    /// Secondary substrats that contribute (cross-domain knowledge)
    /// e.g., "geometric sequences" pulls from both sequence + geometry substrats
    secondary: Vec<SubstratMatch>,
}

The current prompt augmentation does a crude version of this (inject correct examples + anti-recall hints). This formalizes it as the primary mechanism.

Key design: WorkingMemory is small — 3 exemplars, 2 anti-patterns, 1 method. Like Miller's 7±2, the brain doesn't dump everything it knows into the prompt. It selects what's most relevant. This keeps prompts compact for small models. When multiple substrats match, exemplars are drawn proportionally from each.

Layer 4: Execution (three paths)

Path A — Cerebellum (compiled WASM, 0 LLM calls, <1ms)
  Condition: method_neuron.stage == Cerebellum
  When: procedure verified 100+ times, compiled to WASM
  Cost: near-zero

Path B — Neocortex (parameterized code, 0 LLM calls, ~200ms)
  Condition: method_neuron has parameterized code + parameters extracted
  When: method verified 10+ times, code is deterministic
  Cost: one Python subprocess

Path C — Guided LLM (1 LLM call with curated context)
  Condition: everything else
  When: novel problems, uncertain methods, no code
  Cost: 1 local LLM call, but with perfect context
  This IS the default path. No shame in it.

Path C is where most queries go early in the brain's life. Over time, paths B and A take over as methods mature. This is biological: a student starts by solving problems consciously (PFC + LLM), then builds intuition (neocortex), then automates (cerebellum).

Layer 5: Learning (richer signals)

After execution, three learning signals:

pub struct LearningSignal {
    /// Did it work? (amygdala — reward)
    reward: f32,           // -1.0 to 1.0

    /// Was I surprised? (prediction error)
    surprise: f32,         // |predicted_confidence - actual_outcome|

    /// Which substrat(s) were activated? (hippocampal tag)
    substrat_matches: Vec<SubstratMatch>,

    /// What method was used? (procedural tag)
    method_used: String,

    /// Was code generated? Was it verified?
    code_quality: CodeQuality,
}

Reward (already implemented) — correct/incorrect, updates confidence.

Surprise (new) — "I predicted 80% confidence but failed." This tells the brain its model of this substrat is wrong. High surprise → strong learning signal. Low surprise → small confirmation.

Tagging (new) — every experience gets labeled. This is the raw material for dream compression.

Space = Substrat

The Unification

The user-facing concept of Space and the brain's internal concept of Substrat are the same thing viewed from different angles:

User sees	Brain sees
"Sophie's Math Homework" (a Space)	A substrat centered on math-homework embeddings
"Cooking" Space	A substrat centered on cooking-related embeddings
Spaces can overlap ("Chemistry" relates to both "Cooking" and "Health")	Substrats overlap when their confidence zones intersect
A Space grows as you use it	A substrat's centroid drifts, scope refines, confidence grows with experience

The user says "create a Space for science fair." The brain creates a substrat. As Sophie works on the science fair, the substrat's centroid drifts toward whatever she's actually doing (maybe it's 60% biology, 30% presentation skills, 10% scheduling). The brain knows this from the embedding topology — no labels needed.

Why the Current Tree Model Is Wrong

The current Space model uses parent_id — a tree:

School
  ├── Math Homework
  ├── Science Fair
  └── French Class

This forces artificial choices:

Where does "Science Fair Presentation" go? Under "Science Fair" or under a hypothetical "Presentations" space?
A child's "Reading" relates to both "School" and "Bedtime Routine" — but it can only have one parent.
Reparenting a Space is a manual operation that breaks the hierarchy.

Brains don't organize knowledge in trees. They organize in overlapping, nested, interconnected regions.

The Substrat Graph Model

Replace the tree with an oriented graph where relationships emerge from geometry:

pub struct Substrat {
    id: SubstratId,

    /// How this substrat was created
    origin: SubstratOrigin,

    /// Center of this knowledge region (384-dim sentence embedding)
    centroid: Vec<f32>,

    /// Scope — how broad this substrat covers in embedding space
    scope: f32,

    /// Confidence — how reliable this substrat's knowledge is (0.0 to 1.0)
    confidence: f32,

    /// Temperature — recency-based activation warmth (0.0 to 1.0)
    temperature: f32,
    last_activated: Instant,

    /// Neurons that belong here
    neuron_ids: Vec<NeuronId>,

    /// Explicit edges to other substrats (for non-geometric relationships)
    edges: Vec<SubstratEdge>,
}

pub struct SubstratEdge {
    target: SubstratId,
    edge_type: EdgeType,
    weight: f32,  // strength, evolves with usage
}

pub enum EdgeType {
    /// This substrat is geometrically inside another (auto-detected)
    ContainedBy,
    /// Two substrats' zones overlap (auto-detected from embedding proximity)
    Overlaps,
    /// User or brain explicitly linked them ("Cooking feeds into Nutrition")
    Feeds,
    /// Inhibitory: activating one suppresses the other
    Inhibits,
}

How Relationships Emerge

Most edges are NOT declared — they emerge from geometry:

Containment: substrat A's centroid is within B's zone
             AND A's scope < B's scope
             → A is inside B (like "GCD" inside "Number Theory")

Overlap:     distance(A.centroid, B.centroid) < A.scope + B.scope
             → A and B share territory (like "Biochemistry" between "Bio" and "Chem")

Adjacency:   distance ≈ A.scope + B.scope (barely touching)
             → related but distinct (like "Cooking" and "Nutrition")

Independence: distance >> A.scope + B.scope
             → no relationship (like "Cooking" and "Car Maintenance")

The brain computes these relationships during dream cycles — no user intervention needed. But the user CAN create explicit edges ("link Science Fair to Presentations") which add Feeds edges.

Organisms and Sub-Organisms

This maps exactly to the biological organism model:

Organism (Group-level brain)
  ├── Substrat "Mathematics" (scope: large, conf: 0.85, many neurons)
  │     ├── Sub-substrat "GCD" (scope: tight, conf: 0.94, method neuron, mature)
  │     ├── Sub-substrat "Quadratic" (scope: medium, conf: 0.63, growing)
  │     └── Sub-substrat "Geometry" (scope: large, conf: 0.20, few neurons, young)
  │           ↕ overlap with "Art" substrat
  ├── Substrat "Cooking" (scope: medium, conf: 0.40)
  │     ↕ overlap with "Chemistry" and "Nutrition"
  └── Substrat "Daily Routine" (scope: large, conf: 0.70, loose)
        ├── Sub-substrat "Morning" (tight)
        └── Sub-substrat "Bedtime" (tight)
              ↕ overlap with "Reading"

Each substrat IS a sub-organism:

It has its own neurons (memory)
Its own method neurons (compiled knowledge)
Its own anti-patterns (what doesn't work)
Its own maturity level (scope = breadth, confidence = reliability)
Its own dream cycle contribution (consolidation within the substrat)

The Group-level organism IS the collection of all substrats + their edges. Intelligence isn't in any single substrat — it's in the topology of the graph.

What Changes for the User

Nothing visible changes. Users still see "Spaces." They create, name, and navigate them. The difference:

Before (tree)	After (substrat graph)
"Create Space" → choose a parent	"Create Space" → brain finds where it fits in the topology
Moving a Space = reparenting	Spaces drift naturally as usage evolves
One parent only	Multiple relationships (containment, overlap, feeds)
User organizes	Brain organizes, user overrides when needed
"Where does this go?"	"It goes where it naturally belongs"
Every query starts from scratch	Recently used Spaces are "warm" — better at catching ambiguous queries
Empty Spaces are dead weight	Empty Spaces have a name-bootstrapped centroid, ready to learn

Example flow:

User creates "Science Fair" Space
Brain creates substrat with centroid from "science fair" embedding
Sophie asks about "photosynthesis for the poster" → embedding lands in Science Fair zone
She also asks about "how to make the poster look good" → embedding is between Science Fair and a hypothetical "Art" substrat
Dream cycle detects overlap between Science Fair and Art → auto-creates edge
Next time Sophie's in Art space and asks about visual layout, the brain pulls knowledge from both Art AND Science Fair substrats

The brain becomes a navigator of the user's knowledge topology, not a filing cabinet.

Lifecycle Duality: Explicit vs Emergent Substrats

User-created Spaces and brain-discovered substrats have different lifecycles. This must be handled explicitly:

Explicit substrats — user creates a Space ("Photography"). The brain creates a substrat with:

Name from the user
Centroid bootstrapped from the name's embedding ("photography" → 384-dim)
Large scope (uncertain, user hasn't defined boundaries yet)
Low confidence (no experiences yet)
Persistent even if empty — the user wants this Space to exist

Emergent substrats — the brain discovers a cluster from usage. No name. No user intent. Examples:

Sophie keeps asking about meal prep → a "meal prep" substrat forms naturally
The brain can suggest: "You've been asking a lot about meal prep — should I create a Space for this?"
If promoted, it becomes explicit (gets a name, becomes navigable in the UI)
If not promoted, it stays as brain-internal knowledge organization

What happens on deletion:

User deletes a Space → the substrat is archived, not destroyed
Neurons, method neurons, and anti-patterns are preserved (the brain doesn't forget learned knowledge)
The substrat becomes "dormant" — no longer matches queries, no longer navigable in UI
Can be reactivated if the user creates a similar Space later

What happens with empty Spaces:

A Space with no interactions has only a name-bootstrapped centroid — it's a placeholder
As the user interacts, the centroid drifts toward actual usage patterns
The name "Photography" might drift toward "Photo editing on iPhone" as that's what the user actually does
The name doesn't change (it's the user's label), but the substrat's scope tightens around real usage

pub enum SubstratOrigin {
    /// User-created Space — persists even if empty, name is fixed
    Explicit { name: String, created_by: UserId },
    /// Brain-discovered cluster — can be promoted or pruned
    Emergent { suggested_name: Option<String>, discovered_at: DateTime },
    /// Was explicit, user deleted — archived, knowledge preserved
    Archived { original_name: String, archived_at: DateTime },
}

Migration Path

The parent_id field doesn't need to disappear immediately. It becomes one source of explicit edges:

// During migration: parent_id → ContainedBy edge
if let Some(parent) = space.parent_id {
    substrat.edges.push(SubstratEdge {
        target: parent.into(),
        edge_type: EdgeType::ContainedBy,
        weight: 1.0,  // explicit user choice = strong
    });
}

New spaces created after the migration use the substrat model natively. Old spaces with parent_id keep working through the edge translation.

The Dream Cycle

Compression IS intelligence

Storing 134 individual trees is memory. Compressing them into 10 methods is intelligence. The dream cycle's primary job is compression, not cleanup.

Phase 0: Substrat Formation (no LLM needed)

For every new neuron, compute its distance to existing substrat centroids. If it falls within a substrat's confidence zone → assign it there. If it's far from all substrats → birth a new substrat centered on this embedding.

New neuron with embedding E:
  For each substrat S:
    distance = cosine_distance(E, S.centroid)
    membership = exp(-distance² / (2 * S.scope²))
    if membership > threshold → assign to S, update S.centroid (running mean)
  If no substrat matched → birth new substrat(centroid=E, scope=0.50, confidence=0.0)

Cost: zero LLM calls. Pure vector math. The brain discovers its own topology from the embedding space — no labels, no classification, no LLM dependency.

Optionally, the local LLM can name substrats for human readability ("this cluster seems to be about GCD problems"), but the brain doesn't need names to function. Names are UI, not intelligence.

This is the hippocampus placing new memories into its spatial map — the memory's location IS its meaning.

Substrats naturally cluster neurons by embedding proximity. During dream, refine:

Coherence check — compute internal agreement within each substrat. If most members got the same answer for similar inputs, the substrat is coherent and ready for method birth.
Bicoherence detection — if a substrat has two internal clusters (bimodal), split it (mitosis). This is how "algebra" naturally splits into "linear algebra" and "quadratic" as the brain sees more examples.
Overlap detection — if two substrats' confidence zones overlap >80%, merge them.
Centroid recalculation — weighted by success (correct solutions pull the centroid more).

Phase 2: Birth (method neurons)

For each substrat with 5+ members and >60% internal agreement:

struct MethodNeuron {
    substrat_id: SubstratId,

    /// The generalized procedure
    /// Option A: parameterized code (if 3+ members had working code)
    /// Option B: solution template (natural language method description)
    /// Option C: best exemplar's code (fallback)
    procedure: Procedure,

    /// 3 best solved examples (for few-shot when LLM is needed)
    exemplars: Vec<StoredExample>,

    /// Known failure modes (from incorrect members)
    anti_patterns: Vec<String>,

    /// Confidence = weighted accuracy across cluster members
    confidence: f32,

    /// Children: the experience neurons that birthed this method
    children: Vec<NeuronId>,
}

The method neuron replaces the experience neurons for recall purposes. It matches any query that falls within the substrat's confidence zone (broad), not just queries similar to one specific past query (narrow).

For code generalization, try to extract parameterized code:

Find the experience with the best code
Ask the local LLM: "Rewrite this code as a function with parameters: [code]"
Verify the parameterized version against 3 stored examples
If it passes, store as the method's procedure

Cost: 1 LLM call per method birth + 3 Python verifications. Amortized across all future recalls within this substrat.

Phase 3: Prune

Experience neurons covered by a method neuron → archive (keep 3 best as exemplars, delete rest)
Method neurons with <30% accuracy after 10+ uses → demote, re-collect experiences
Neurons not recalled in 30+ days with low confidence → delete

The brain gets leaner and faster over time, not just bigger.

After 100 math problems: maybe 134 → 15 method neurons + 45 exemplars. That's 60 trees instead of 134. Each method matches everything within its substrat's confidence zone.

Phase 4: Code Robustness (existing)

Re-execute all stored code. Boost working code, remove broken code.

Phase 5: Compile (myelination)

Method neurons with confidence > 0.95, verified 100+ times → compile to WASM.

This connects to the existing WASM extension SDK. The brain literally grows new extensions as it learns. The cerebellum IS the extension ecosystem.

Phase 6: Creative Recombination (REM sleep)

The most speculative but potentially most powerful phase.

Biological REM sleep creates novel combinations — "what if I combined the recipe method with the scheduling method?" Applied:

Take method A from substrat X and method B from substrat Y
Generate a hypothesis: "could method A's approach apply to substrat Y?"
Test against stored examples from substrat Y
If it works better than method B → new method born

This is how biological brains discover that the same algorithm applies to seemingly different domains (e.g., "shortest path" applies to both map navigation and network routing).

Cost: a few LLM calls per dream cycle. Only attempt when the brain is mature (50+ method neurons). This is optional and future-looking.

Wild Ideas From Nature

1. Immune System — Negative Selection

The immune system knows what NOT to react to (self-tolerance) as much as what to react to. Central tolerance deletes T-cells that attack self; peripheral tolerance suppresses remaining self-reactive cells.

Applied: Per-substrat anti-pattern libraries.

struct SubstratKnowledge {
    /// What works
    method: Procedure,
    exemplars: Vec<Example>,

    /// What DOESN'T work (equally valuable)
    anti_patterns: Vec<AntiPattern>,
    // "Don't try to expand the modulus first"
    // "The naive area formula fails for inscribed polygons"
    // "Brute force times out for n > 10000"
}

Anti-patterns are currently per-tree. Make them per-substrat. A substrat's failures are as valuable as its successes — they prevent the brain from repeating mistakes.

When providing context to the LLM (Path C), include 1-2 anti-patterns alongside exemplars. "Here's how to solve this, and here's what NOT to do."

2. Sleep Stages — Different Dream Phases

Real sleep has distinct stages with different cognitive functions:

Sleep Stage	Biological Function	Brain Equivalent
N1/N2 (light)	Sort and tag recent memories	Phase 0: Label new neurons
N3 (deep)	Consolidate to long-term, prune weak	Phases 1-3: Cluster, birth, prune
REM	Creative recombination, test hypotheses	Phase 6: Cross-substrat method transfer

Run light sleep frequently (every 5 minutes — tag new experiences). Run deep sleep less often (every hour — compress and prune). Run REM rarely (daily — creative exploration).

3. Critical Periods — Neuroplasticity Windows

Young brains have high plasticity (learn fast, unstable). Mature brains have low plasticity (stable, learn slowly). This is controlled by inhibitory neurons that gate plasticity.

Applied: Per-substrat learning rate.

First 10 problems in a substrat:
  → HIGH plasticity: store everything, accept contradictions, don't prune
  → Large scope (exploring boundaries), low confidence (few data points)
  → Centroid shifts easily with each new example

10-50 problems:
  → MEDIUM plasticity: start consolidating, birth methods, moderate pruning
  → Scope stabilizing, confidence growing with success rate
  → Centroid shift weighted by success (good results pull harder)

50+ problems:
  → LOW plasticity: only update on surprises, aggressive pruning, compile to WASM
  → Tight scope, high confidence, stable centroid, resists perturbation
  → Only high-surprise events (prediction errors) cause significant updates

This prevents a mature substrat from being destabilized by one outlier, while keeping new substrats open to rapid learning. Scope and confidence together encode maturity — new substrats have large scope + low confidence (uncertain), mature ones have refined scope + high confidence (expert).

4. Mirror Neurons — Learn by Watching the LLM

Mirror neurons fire both when performing an action AND when observing someone else perform it. They're the basis of imitation learning.

Applied: When the LLM solves a problem (Path C), the brain doesn't just store the answer. It observes the LLM's reasoning and extracts structure:

LLM output: "To find GCD(48,18), I'll use the Euclidean algorithm:
             48 = 2×18 + 12, 18 = 1×12 + 6, 12 = 2×6 + 0. GCD = 6."

Brain observes:
  substrat: closest to "gcd" centroid (distance: 0.08)
  method: "euclidean_algorithm"
  pattern: "repeated division with remainder"
  parameters: {a: 48, b: 18}
  answer: 6

The LLM already explains its reasoning (chain-of-thought). The brain should parse this to extract method labels and patterns. One extra LLM call: "What method did you just use? What are the parameters?"

Over time, the brain learns the LLM's vocabulary for methods. It can then provide context in the LLM's own language.

5. Embodied Cognition — The Knowledge Graph IS the Brain

Biological organisms use the environment as external memory. Ants leave pheromone trails. Humans write notes. The environment structures cognition.

Applied: The SubstratIndex should be navigable, not flat.

Substrat space (384-dim, visualized):
  ● "GCD cluster" (scope: 0.12, conf: 0.94, 47 neurons, method: euclidean)
  ● "LCM cluster" (scope: 0.15, conf: 0.80, 12 neurons, method: via_gcd)
     ↕ overlap zone — LCM/GCD share neurons (experiment: centroid distance 0.33)
  ● "Primality cluster" (scope: 0.18, conf: 0.80, 8 neurons, method: trial_division)

  ● "Quadratic cluster" (scope: 0.25, conf: 0.63, 5 neurons, method: quadratic_formula)

  ○ "Geometry cluster" (scope: 0.40, conf: 0.20, 3 neurons, no method yet — too young)

Navigation through this embedding space IS cognition. A query's embedding lands near a substrat centroid → the brain activates that region (and warms its temperature). Scope encodes breadth, confidence encodes expertise. Overlap zones naturally capture cross-domain problems.

Note from experiment: GCD and LCM centroids are only 0.33 apart, while their intra-type means are 0.45 and 0.52. In practice, they'd naturally merge into a single "divisibility" substrat — which is correct. The brain discovers that GCD and LCM are the same domain, even though humans label them separately.

This graph structure emerges naturally from the dream cycle's substrat formation + method births. No manual design needed.

6. Neurogenesis — The Brain Grows New Capabilities

Adult neurogenesis (new neuron birth) happens in the hippocampus throughout life. New neurons help distinguish between similar memories (pattern separation).

Applied: When the brain encounters a genuinely new domain (far from all substrat centroids), it should:

Birth a new substrat (new region of the brain) centered on this query's embedding
Set plasticity to HIGH (large scope, low confidence, critical period)
Allocate attention to this domain (salience boost)
After 10+ experiences, attempt first method birth

The brain literally grows new regions as it encounters new domains. A family AI that starts with math knowledge and encounters cooking for the first time grows a "cooking" substrat from scratch. No labels needed — the brain discovers the new domain purely from the embedding topology.

Critical Assessment

Can we actually build this?

Honest answer: yes, with constraints.

What's achievable (high confidence)

Substrat-based recognition — Pure embedding math. No LLM calls needed for recognition at all. A problem's embedding is compared to substrat centroids — if it falls within a confidence zone, the brain knows what kind of problem this is. Expected accuracy: 80%+ (embeddings from models like E5 or fastembed already cluster semantically similar problems naturally).

Prompt augmentation as primary mode — Already working at 59% accuracy. Formalizing it as WorkingMemory with structured exemplar selection from the best-matching substrat(s) will improve it further.

Dream compression — Substrats form automatically from embedding proximity. Method neuron births are algorithmic. Zero LLM calls needed for the entire recognition + compression pipeline. LLM is only used optionally for: naming substrats (cosmetic), parameterizing code (one-time), and guiding execution (the whispered advisor role).

Parameterized code extraction — Asking "rewrite this code as a function with parameters" is a simple LLM task. Even a 1.5B model can do this for math code. Verification is pure Python execution.

Anti-pattern tracking — Engineering work, no AI needed. Track what fails per substrat.

What's hard (medium confidence)

Code generation quality — 1.5B models generate valid Python 38% of the time. This limits how many method neurons can have executable code. Mitigation: store solution templates (natural language method descriptions) as the primary representation, code as an optimization.

Parameterized code reliability — Even if we extract parameterized code, verifying it across different parameter ranges is non-trivial. A function that works for gcd(48, 18) might fail for edge cases. Mitigation: verify against 5+ stored examples with different parameters.

Cross-substrat transfer (REM sleep) — Discovering that the same method applies to different domains requires genuine reasoning. Small models may not produce useful hypotheses. Mitigation: defer this to phase 4, only attempt when brain is mature.

What's fundamentally limited

Small model ceiling — A 1.5B model has a quality ceiling. No amount of context curation will make it solve problems that require 70B-level reasoning. The brain can't create intelligence that doesn't exist in the model.

But — the brain's goal isn't to exceed the model. It's to:

Consistently reach the model's ceiling (via perfect context)
Handle known problems without the model (via compiled procedures)
Accumulate knowledge that transfers to better models later

When the user upgrades from 1.5B to 7B, all the brain's compressed knowledge (substrats, methods, anti-patterns, WASM modules) immediately makes the 7B model perform like a domain-expert 70B.

This is the product insight: the brain is an amplifier. A small model + rich brain outperforms a big model + no brain. And the brain is portable, private, and grows with the user.

The honest risk

The risk is: we build infrastructure that doesn't produce measurably better results. The current brain has 21 files, 7,500 lines, 167 tests — and saves 7% of LLM calls with 16.7% recall accuracy.

The mitigation is: measure relentlessly, ship incrementally, cut what doesn't work.

Substrat formation should show improvement in 1 week (measurable: recall accuracy goes from 10% to 60%+)
Method neurons should show improvement in 2 weeks (measurable: LLM savings go from 7% to 30%+)
If either doesn't move the needle, stop and reconsider

The vision document (digital-brain-vision.md) is architecturally sound. But the implementation must be metrics-driven. Every feature earns its place by moving the numbers. If NeuronStages don't improve accuracy, cut them. If prediction tracking doesn't improve learning speed, cut it.

What I think about the vision

The vision is right about the destination: a brain that learns, compresses, and eventually runs autonomously. The architecture — neuron maturation, dream-driven compression, recursive decomposition, myelination — maps biological mechanisms to concrete implementations.

But it underestimates the foundation problem. The current structural matching doesn't reliably recognize the same problem type. Code substitution doesn't work. These aren't features to add later — they're prerequisites for everything else. Method neurons can't be born from clusters that don't form because matching doesn't work.

The revised architecture (brain as attention system, substrat-based recognition) fixes the foundation. The vision doc's higher-level features (stages, decomposition, prediction) build on top once the foundation works.

The sequence matters: foundation first, then intelligence, then optimization.

What Works, What Doesn't, What's Missing

Keep (working and valuable)

Component	Why it works
TrajectorySegmenter	Structural decomposition is genuinely better than flat cosine
SparseFingerprint + FingerprintIndex	O(1) candidate retrieval, fast
NeuronStore (3 implementations)	Solid persistence layer
Prompt augmentation	59% accuracy vs 24% baseline — the brain's best feature
Self-verification	100% of stored code verified. No broken code persists
Anti-recall / quarantine	Prevents replaying wrong answers
Confidence tracking	Per-neuron quality signals with temporal decay
Dream consolidation framework	The 5-phase cycle structure is right
Feature gating	Clean separation, no regressions

Fix (broken but fixable)

Component	Problem	Fix
Code substitution	Empty `{}` — can't identify parameters	Parameterized code + LLM-extracted parameter names
Structural matching thresholds	Same problem type doesn't match (10% recall)	Substrat-based matching as primary, structural as secondary
Code-first default	1.5B model can't code 62% of the time	Make strategy model-aware. Code for 7B+, templates for 1.5B
Dual-path verification	Costs more LLM calls than it saves	Only verify when surprise is high (prediction error)
Dream merging	Merges by fingerprint, not semantics	Cluster by substrat proximity instead

Reuse (existing infrastructure that maps to substrats)

Component	Action	Why
SignalGraphExecutor	REUSE heavily	Core propagation engine (BFS, safety bounds, trace recording) maps directly to substrat graph traversal. Add substrat-aware routing and multi-step credit assignment in `learn()`.
SpaceOrganismRegistry	ADAPT	Collection/registry pattern is sound. Upgrade cross-space edges to graph-aware edges. Add coherence tracking across substrats.
Grammar (TextGrammar)	REUSE + REFACTOR	Working tokenization impl. Move into encoding pipeline: `Grammar.tokenize() → EmbeddingModel.embed() → sentence vector`. SpaceOrganism should call this, not inline tokenization.

Pause (consumers, not providers)

Component	Why pause
LlmOrganism / WasmOrganism	These are inference consumers, not memory providers. They belong in V2.1 Knowledge Marketplace as "specialist nodes," not in the core substrat graph. The core graph is memory + recall + learning. LLMs/WASM are inference + tooling.
gRPC proto definitions	Federation before local intelligence works
SubstratEncoder	Abstraction layer with one implementation — but the Grammar/Embedding pipeline above replaces its role

These files stay feature-gated. When the intelligence works and needs deployment infrastructure, they're there.

Build (missing and essential)

Component	Why essential	Effort
Substrat + SubstratIndex	Fix the 10% recall rate. 97% nearest-centroid accuracy proven. Zero LLM cost. Includes scope/confidence/temperature.	~150 lines
SubstratOrigin	Explicit (user Space) vs Emergent (brain-discovered) vs Archived lifecycle	~30 lines
WorkingMemory	Formalize prompt augmentation as primary mechanism. Draw from multiple substrats proportionally.	~80 lines
Substrat formation (dream Phase 0)	Automatic clustering from embedding topology, Gaussian membership, no LLM	~60 lines
Method neuron births (dream Phase 2)	Compression — 134 trees → 15 methods	~150 lines
Parameterized code extraction	Make code substitution actually work	~100 lines
Prediction tracking	Richer learning signal than binary reward	~60 lines
NeuronStage enum	Maturation tracking (hippocampus/neocortex/cerebellum)	~20 lines
SignalGraphExecutor adaptation	Add substrat-aware routing + multi-step credit assignment to existing engine	~40 lines

Implementation Priorities

Week 1: Fix the Foundation

Goal: Substrat-based recognition. Recall accuracy 10% → 60%+.

Build Substrat struct — centroid embedding + scope + confidence + temperature + neuron IDs
Build SubstratIndex — substrat formation, matching, centroid updates
Dream Phase 0: automatic substrat formation from embedding proximity (zero LLM calls)
Recognition: substrat match first, structural match within substrat
Keep prompt augmentation as the primary execution path

Measure: Run benchmark. Count substrat recognition rate. Target: 60%+ of re-seen problems fall within correct substrat's confidence zone. Experiment already showed 97% nearest-centroid accuracy on sentence embeddings — this target should be achievable. Key advantage: zero LLM dependency for recognition.

Week 2: Method Neuron Births

Goal: Dream compression. 134 trees → ~15 method neurons. LLM savings 7% → 30%+.

Dream clustering by substrat membership (already done in Phase 0)
Method neuron birth from substrats with 5+ members
Parameterized code extraction (LLM + verification)
WorkingMemory struct: 3 exemplars + 2 anti-patterns + 1 method
Experience neuron archival (method replaces its children for recall)

Measure: Run benchmark. Count method neurons born. Count LLM calls saved by method recall. Target: 30% savings on known substrats.

Week 3: Prediction + Maturation

Goal: Richer learning, neuron lifecycle. Learning speed improvement.

NeuronStage enum (Hippocampus / Neocortex / Cerebellum)
Prediction tracking: before execution, predict confidence. After, compute surprise.
Surprise-weighted learning (high surprise → strong update)
Per-substrat plasticity (critical periods)
Anti-pattern tracking per substrat

Measure: Run benchmark with 200 problems. Compare learning curve (accuracy at problem 50, 100, 150, 200). Target: steeper curve than current.

Week 4: Compilation + Polish

Goal: WASM compilation for mature methods. Full pipeline.

Myelination: methods with confidence >0.95 → compile to WASM
Promote from bench-cli to morphee-core (the intelligence should be in core, not consumers)
Wire into morphee-server
Salience-weighted search (context + recency + confidence)

Measure: End-to-end benchmark. 200 problems, 4 phases. Target: 50% LLM savings, 40%+ accuracy on known substrats with 0 LLM calls.

Future: Recursive Decomposition + Federation

Only after weeks 1-4 produce measurable improvement:

Recursive decomposition for novel problems ("break this into sub-problems I know")
REM sleep (cross-substrat method transfer)
Federated brain (share method neurons across instances)
Mirror neuron learning (observe LLM chain-of-thought, extract method structure)

Open Questions

1. What's the right initial scope for new substrats?

This controls how eagerly the brain assigns new problems to existing substrats vs. birthing new ones. Too large → everything lumps together ("math"). Too small → every problem is its own substrat.

Experiment data suggests: Intra-type distances average 0.53, inter-type distances average 0.66. Initial scope of ~0.45-0.50 would capture most same-type problems while excluding different types. But the Gaussian membership function gives soft edges, so the exact value matters less than with linear cutoffs.

Likely approach: start with scope = 0.50 (Gaussian sigma). Tunable via benchmark. The dream cycle corrects mistakes anyway (split/merge).

2. How many substrats emerge from real usage?

The experiment showed 10 distinct math types cluster well with sentence embeddings. But GCD and LCM naturally merge (centroid distance 0.33 < intra-type distance 0.45). So 10 labeled types → ~7-8 substrats in practice.

A family AI might have 50-100 substrats. SubstratIndex is O(N) in substrat count for matching (N cosine comparisons). With 100 substrats, that's 100 dot products — microseconds. With 10,000 substrats, an ANN index (ball tree or VP-tree) keeps it O(log N).

3. How does granularity self-regulate?

The dream cycle handles this naturally:

New substrat starts with a large scope (uncertain, inclusive) + low confidence
More examples → centroid stabilizes, scope adjusts to actual coverage
If members diverge (bicoherent) → the substrat splits (mitosis)
If two substrats overlap too much → they merge

Experiment insight: GCD and LCM are likely to START as one merged substrat and may NEVER split, because their embeddings are genuinely close (distance 0.33). This is actually correct — they use the same mathematical machinery (divisibility). The brain discovers that these are one domain even if humans label them separately.

4. Should method neurons store code or templates?

For domains where code works (math, data processing): parameterized code. For domains where code doesn't apply (writing, planning, conversation): solution templates (structured natural language methods).

Both should be supported. The Procedure type should be an enum.

5. When does the brain stop needing the LLM?

Asymptotically, but never fully. Even a mature brain encounters novel problems. The goal is:

Day 1: 100% of queries need LLM
Day 30: 50% of queries need LLM (known substrats handled by methods)
Day 100: 20% of queries need LLM (most substrats covered)
Day 365: 5% of queries need LLM (rare novelty + verification)

The LLM transitions from "primary solver" to "teacher for new domains" to "occasional consultant."

Metrics That Matter

Primary metrics (measure every benchmark)

Metric	Current	Week 1 Target	Week 4 Target
Substrat recognition rate	~10% (structural)	60%+	80%+
LLM call savings	7%	15%	50%+
Recall accuracy (brain alone)	16.7%	40%+	70%+
Overall accuracy	54% (re-seen)	60%	70%+
Tree count (lower = more compressed)	134 (150 problems)	100	50
Substrats formed	0	8+	15+
Method neurons born	0	5+	15+

Secondary metrics (track but don't optimize for)

Metric	Purpose
Recall latency	Brain shouldn't add >100ms
Dream cycle duration	Should complete in <30s
Code rate	% of method neurons with executable code
Anti-pattern count per substrat	Richer = fewer repeated mistakes
Surprise calibration	How well does predicted confidence match actual success rate

The North Star

Near-term (provable now):

A small local model + a mature brain should dramatically outperform the same small model with no brain.

1.5B + brain (100 trained problems)  vs  1.5B alone

Current: 54% vs 22% on re-seen problems. Target: 80% vs 22%. This is the achievable, measurable proof that the brain adds value.

Long-term (aspirational):

A small local model + a mature brain should match or outperform a large cloud model with no brain, on domains the brain has experience in.

1.5B + brain (1000 trained problems)  vs  70B + no training

A 70B model will still crush a 1.5B on novel reasoning. But on domains where the brain has compiled methods, parameterized code, and anti-patterns — the brain-augmented small model should match or exceed it. This is the product thesis: the brain is a knowledge amplifier. And unlike the 70B, it's private, local, and the user owns their intelligence.

The near-term benchmark proves the mechanism works. The long-term benchmark proves the product vision.

Table of Contents​

Executive Summary​

Design Principles​

What we're building​

Priority order​

The role of the LLM​

What "learning from experience" actually means​

Benchmark Reality Check​

The experiment​

Where the accuracy actually comes from​

LLM call accounting​

What actually worked​

What failed​

The Fundamental Misdiagnosis​

1. Brains are not caches​

2. Perfect recall of wrong answers is worse than no recall​

3. The substitution problem is architectural, not a bug​

What Nature Actually Does​

The hippocampus stores POINTERS, not memories​

The cerebellum stores PROCEDURES, not scripts​

The prefrontal cortex is an ATTENTION CONTROLLER​

The amygdala tags WHAT MATTERS​

The thalamus FILTERS what reaches consciousness​

The Revised Architecture​

Core shift: Brain as Attention System, not Cache​

Layer 1: Perception (keep as-is)​

Layer 2: Recognition (REDESIGN)​

Experimental Validation​

Membership Function: Gaussian, Not Linear​

Temperature: Conversational Context Routing​

Scope vs Confidence: Separate Concepts​

Layer 3: Working Memory / Attention (NEW — the real brain)​

Layer 4: Execution (three paths)​

Layer 5: Learning (richer signals)​

Space = Substrat​

The Unification​

Why the Current Tree Model Is Wrong​

The Substrat Graph Model​

How Relationships Emerge​

Organisms and Sub-Organisms​

What Changes for the User​

Lifecycle Duality: Explicit vs Emergent Substrats​

Migration Path​

The Dream Cycle​

Compression IS intelligence​

Phase 0: Substrat Formation (no LLM needed)​

Phase 1: Cluster Refinement​

Phase 2: Birth (method neurons)​

Phase 3: Prune​

Phase 4: Code Robustness (existing)​

Phase 5: Compile (myelination)​

Phase 6: Creative Recombination (REM sleep)​

Wild Ideas From Nature​

1. Immune System — Negative Selection​

2. Sleep Stages — Different Dream Phases​

3. Critical Periods — Neuroplasticity Windows​

4. Mirror Neurons — Learn by Watching the LLM​

5. Embodied Cognition — The Knowledge Graph IS the Brain​

6. Neurogenesis — The Brain Grows New Capabilities​

Critical Assessment​

Can we actually build this?​

What's achievable (high confidence)​

What's hard (medium confidence)​

What's fundamentally limited​

The honest risk​

What I think about the vision​

What Works, What Doesn't, What's Missing​

Keep (working and valuable)​

Fix (broken but fixable)​

Reuse (existing infrastructure that maps to substrats)​

Pause (consumers, not providers)​

Build (missing and essential)​

Implementation Priorities​

Week 1: Fix the Foundation​

Week 2: Method Neuron Births​

Week 3: Prediction + Maturation​

Week 4: Compilation + Polish​

Future: Recursive Decomposition + Federation​

Open Questions​

1. What's the right initial scope for new substrats?​

Table of Contents