Brain Critical Analysis — From Cache to Cognition
Status: Analysis complete, implementation priorities defined Date: March 2, 2026 Prerequisites: fractal-brain.md, digital-brain-vision.md Context: Benchmark results from 150-problem experiment (3 phases × 50 problems)
Table of Contents
- Executive Summary
- Design Principles — What We're Building and Why
- Benchmark Reality Check
- The Fundamental Misdiagnosis
- What Nature Actually Does
- The Revised Architecture — Brain as Attention System
- Space = Substrat — Unifying the User Model and the Brain
- The Dream Cycle — Compression Is Intelligence
- Wild Ideas From Nature
- Critical Assessment — Can We Actually Do This?
- What Works, What Doesn't, What's Missing
- Implementation Priorities
- Open Questions
- Metrics That Matter
Executive Summary
The Fractal Brain (Phases 1-6) built real infrastructure: NeuronTrees, structural matching, 3-mode recall, confidence tracking, dream consolidation, and an organism architecture. The Biological Learning Loop (bench-cli) added code-first storage, self-verification, dual-path verification, and dream replay.
After benchmarking 150 math problems across 3 phases, the honest results:
| Metric | Result | Assessment |
|---|---|---|
| LLM call savings | ~7% | Not meaningful |
| Recall accuracy (brain alone) | 16.7% (1/6 correct) | Broken |
| Prompt augmentation accuracy | 59.1% (26/44 correct) | This works |
| Overall accuracy improvement | 22% → 54% on re-seen problems | Real, but LLM does the work |
| Code substitution | 4 hits, 3 wrong (empty {}) | Fundamentally broken |
The brain is working as a prompt augmentation engine — and that's genuinely valuable. But it's not working as a recall engine. The biological metaphor is ahead of the mechanism.
This document diagnoses why, proposes a revised architecture based on how biological brains actually work, and defines a critical path toward a brain that learns, compresses, and eventually runs autonomously.
Design Principles
What we're building
A brain that runs on-device, learns from experience, and gets smarter over time. Not a cloud service. Not a chatbot wrapper. A private, local intelligence that belongs to the user.
Priority order
- Fast — the brain must add <100ms to response time, not seconds
- Private — everything runs locally. No data leaves the device. Period.
- Cost-saving — LLM inference (even local) has a compute cost. The brain should reduce it over time.
- Learning — the brain gets better with experience. Day 100 is noticeably smarter than day 1.
The role of the LLM
The LLM is not the intelligence. It's a tool the brain uses — a "whispered advisor." A big cloud LLM knows everything but perfectly nothing. A small local model (1.5B-7B) is fast and private but limited. The brain's job is to compensate for the small model's weaknesses by:
- Providing perfect context (so the small model performs like a big one)
- Handling known problems without the model (0 inference cost)
- Compressing experience into procedures the model never needs to re-derive
The brain doesn't replace the LLM. It makes each LLM call maximally effective, and over time, needs fewer of them.
What "learning from experience" actually means
A brain that has solved 50 GCD problems should:
- Recognize a new GCD problem instantly (<10ms, no LLM)
- Know the Euclidean algorithm works (from 50 verified experiences in the same substrat)
- Execute it directly (parameterized code, or WASM for speed)
- Know what doesn't work (anti-patterns from the substrat's failures)
This is not cache lookup. It's genuine competence — the difference between a student who memorized answers and one who understands the subject.
Benchmark Reality Check
The experiment
| Phase | Problems | Condition | Accuracy |
|---|---|---|---|
| Phase 1 (TRAIN) | 50 shuffled | + dream consolidation | 22% |
| Phase 2 (TEST) | 50 new, shuffled | brain has 46 trees | 24% |
| Phase 3 (RE-TEST) | 50 same as P2 | brain has 90 trees | 54% |
Where the accuracy actually comes from
Phase 3 breakdown (the "good" result):
| Source | Problems | Correct | Accuracy | LLM calls |
|---|---|---|---|---|
| Brain recalled (Exact + Variation) | 6 | 1 | 16.7% | 0 |
| LLM with brain-augmented prompt | 44 | 26 | 59.1% | 44+ |
| Total | 50 | 27 | 54% | ~70 |
The brain's recall is essentially broken. 5 out of 6 recalls gave wrong answers. The 54% accuracy comes from the LLM getting better prompts (few-shot examples + anti-recall hints injected by the brain).
LLM call accounting
| Scenario | Total LLM calls (150 problems) | Savings |
|---|---|---|
| No brain, single_shot | 150 | baseline |
| No brain, code_execution | ~225 (retries + fallbacks) | -50% (worse!) |
| Brain + code_execution | ~209 | 7% vs code_execution |
| Brain + code_execution vs single_shot | ~209 vs 150 | 39% MORE calls |
The code_execution strategy itself adds 50% more LLM calls. The brain saves back 7% of those. Net result: we're spending more, not less.
What actually worked
-
Prompt augmentation — injecting correct examples + anti-patterns into the LLM prompt. This is the brain's real contribution. 24% → 59% accuracy on problems where it provided context.
-
Code verification — 52/52 stored code verified deterministic. No broken code persists. The self-verification loop works.
-
Anti-recall — quarantining bad trees. 3 trees quarantined, preventing the brain from confidently replaying wrong answers.
What failed
-
Structural matching — the trajectory segmenter creates trees based on embedding direction changes, not semantic structure. "Find the GCD of 48 and 18" and "Find the GCD of 360 and 240" don't match as Exact (even though they're the same problem type) because the numbers change the embedding trajectory.
-
Code substitution — variation hits produce empty substitutions
{}. The brain detects "these trees are similar" but can't identify which leaf neurons are parameters vs. operations. The code runs with original hardcoded values → wrong answer. -
Code generation — the 1.5B model generates valid Python 38% of the time. Code-first strategy forces the model into its worst mode.
-
Recall rate — 5/50 recalls on problems the brain had already seen (10%). The other 45 were classified Novel again. The brain doesn't recognize problems it already solved.
The Fundamental Misdiagnosis
Both the fractal-brain.md and digital-brain-vision.md share an assumption:
"The brain's goal is to eliminate LLM calls by recalling stored answers."
This frames the brain as a cache. Cache hit → replay answer (0 LLM calls). Cache miss → full LLM call. The metric is hit rate.
This is wrong for three reasons:
1. Brains are not caches
A human who has solved 50 GCD problems doesn't recall "GCD(48,18) = 6." They recognize the category ("this is a GCD problem"), activate the relevant method ("use the Euclidean algorithm"), and apply it. The memory doesn't replace computation — it guides computation.
The benchmark data proves this: prompt augmentation (guiding the LLM) works at 59% accuracy. Direct recall works at 16.7%. The brain is already better at guiding than replaying.
2. Perfect recall of wrong answers is worse than no recall
4 out of 5 recalls were wrong. The brain confidently replayed incorrect code. This is worse than asking the LLM fresh — at least the LLM has a 24% chance of getting it right.
A cache that replays stale data is a liability. A brain that guides fresh computation is an asset.
3. The substitution problem is architectural, not a bug
Code substitution requires knowing what's a parameter and what's a method. gcd(48, 18) — the gcd is the method, 48 and 18 are parameters. But the trajectory segmenter doesn't know this. It segments by embedding direction changes, not semantic roles.
You can't fix this by lowering thresholds or improving fingerprints. The information isn't in the embeddings. It requires either:
- The LLM to label roles at storage time ("this is a GCD problem with arguments 48 and 18")
- Storing parameterized code (
def solve(a, b): return gcd(a, b)) instead of scripts (print(gcd(48, 18))) - Both
What Nature Actually Does
The hippocampus stores POINTERS, not memories
Neuroscience shows the hippocampus stores sparse activation patterns — indices into distributed cortical representations. Not full memories. When replayed during sleep, these indices re-activate the relevant neocortical areas.
Current implementation: NeuronTrees store full solutions. Recall tries to replay.
Biological reality: Store a location in the brain's spatial map (substrat centroid proximity, method used, confidence). When recalled, this location activates relevant context that guides new computation.
The cerebellum stores PROCEDURES, not scripts
The cerebellum stores how to throw a ball — a parameterized motor program that adapts to different distances and weights. Not a recording of one specific throw.
Current implementation: Stored code has hardcoded values.
# This is a SCRIPT (one throw at one distance)
from math import gcd
print(gcd(48, 18))
Biological reality: Store a parameterized procedure.
# This is a PROCEDURE (throwing at any distance)
def solve(a: int, b: int) -> int:
from math import gcd
return gcd(a, b)
The prefrontal cortex is an ATTENTION CONTROLLER
The PFC doesn't execute solutions. It holds a few relevant items in working memory (Miller's 7±2) and orchestrates retrieval. It's a curator.
When facing a novel problem, the PFC:
- Recognizes the broad category ("number theory")
- Retrieves relevant methods ("Euclidean algorithm, prime factoring")
- Selects the most promising one
- Monitors execution and switches if it fails
This is routing with learned preferences, not pattern matching.
The amygdala tags WHAT MATTERS
The amygdala doesn't just say "good/bad." It creates emotional tags: urgency, novelty, social relevance, frustration. These tags determine what gets consolidated during sleep (important → keep, irrelevant → prune).
Current implementation: Binary reward (correct/incorrect).
Biological reality: Rich tagging — was this surprising? Was the user satisfied? Was this a new substrat? Did the brain predict correctly?
The thalamus FILTERS what reaches consciousness
90% of sensory input is filtered out before it reaches cortex. The thalamus gates what's relevant based on current context and goals.
Current implementation: Every query searches all stored trees equally.
Biological reality: Context-aware filtering. In a math conversation, only math neurons are active. In a cooking conversation, math neurons are dormant.
The Revised Architecture
Core shift: Brain as Attention System, not Cache
OLD: Query → match tree → replay stored answer (0 LLM calls)
or fallback to LLM (1+ LLM calls)
NEW: Query → recognize substrat → curate context → execute
(perception) (hippocampus) (thalamus) (cerebellum OR LLM with context)
The brain's primary job is making every LLM call maximally effective by providing perfect context. Its secondary job is eliminating LLM calls for truly procedural knowledge. The old architecture had these backwards.
Layer 1: Perception (keep as-is)
Query text
→ embed_tokens() → per-token hidden states
→ TrajectorySegmenter → NeuronTree
→ SparseFingerprint for fast candidate lookup
The structural decomposition is useful. The fingerprint index provides O(1) candidate retrieval. Keep it.
Layer 2: Recognition (REDESIGN)
Current: Compare query fingerprint against stored trees. Classify as Exact/Variation/Novel.
Proposed: Two-stage recognition.
Stage A — Substrat match (fast, <10ms)
Instead of discrete category labels, the brain organizes knowledge into substrats — continuous regions of embedding space defined by a centroid and a scope. This mirrors how biological place cells work: each fires maximally at one location (centroid) and decreases with distance (Gaussian falloff over scope). Multiple substrats overlap, creating a continuous map of problem-space.
A substrat is NOT a label. It's a spectrum — a center point with fuzzy boundaries. A problem about "geometric sequences" naturally lives partly in the "sequences" substrat and partly in the "geometry" substrat. No forced choice.
Experimental Validation
We validated this with a 100-problem experiment using the same all-MiniLM-L6-v2 model (384-dim, fastembed) that Morphee uses in production. 10 problem types, 10 problems each:
| Metric | Result |
|---|---|
| Nearest-centroid classification | 97/100 = 97% |
| GCD pair distance ("GCD of 48,18" vs "GCD of 360,240") | 0.38 (close — same substrat) |
| Cross-type distance (Geometry vs Quadratic) | 0.66 (far — different substrats) |
| Cross-domain distance (GCD vs Cooking) | 1.00 (maximally distant) |
The document's original concern — that "Find the GCD of 48 and 18" vs "Find the GCD of 360 and 240" wouldn't cluster — was wrong for sentence embeddings. They have distance 0.38, well within a substrat's zone. The trajectory segmenter (per-token) diverges on different numbers, but the sentence-level embedding is dominated by the shared semantic content.
Important nuances from the experiment:
- Separation ratios are WEAK (1.0-1.4 for most math types). Types cluster correctly, but the clusters are large and close together. GCD and LCM centroids are only 0.33 apart while intra-type variance is 0.45-0.52.
- GCD and LCM naturally overlap — they SHOULD be one substrat (both are "number theory: divisibility"). The brain would merge them, which is correct.
- Cooking vs ANY math type = 0.77-0.91 distance. Cross-domain separation is excellent.
- Only 3 confusions out of 100: one combinatorics→geometry, two euler_totient→{lcm, prime}. All were edge cases with unusual phrasing.
Implication for substrat design: Sentence embeddings work for substrat clustering. BUT the membership function must handle the weak inter-type separation within the same domain. A Gaussian falloff (not linear) is essential — see Membership Function section below.
Critical: use sentence-level embeddings, NOT per-token trajectories. The TrajectorySegmenter produces per-token hidden states (384-dim per token), not sentence embeddings. Substrat centroids must use the sentence embedding (mean pool of all token states, or CLS token). The production model is all-MiniLM-L6-v2 via fastembed (ONNX, 384-dim). This is already available in morphee-core/src/providers/embeddings.rs.
pub struct Substrat {
id: SubstratId,
/// Center of this knowledge region (384-dim sentence embedding)
centroid: Vec<f32>,
/// Scope — how broad this substrat covers in embedding space (Gaussian sigma)
/// "Mathematics" has large scope, "GCD" has small scope.
/// Starts with a bootstrap estimate, refined by dream cycles.
scope: f32,
/// Confidence — how reliable this substrat's knowledge is (0.0 to 1.0)
/// Separate from scope. A new "Cooking" substrat has large scope + LOW confidence.
/// A mature "Mathematics" substrat has large scope + HIGH confidence.
confidence: f32,
/// Temperature — recency-based activation warmth (0.0 cold to 1.0 hot)
/// Warm substrats match more easily (effective scope expands).
/// Enables conversational context routing.
temperature: f32,
last_activated: std::time::Instant,
/// Neurons that belong to this substrat
neuron_ids: Vec<NeuronId>,
/// How many experiences shaped this substrat
exemplar_count: u32,
/// The generalized method, once born
method: Option<MethodNeuron>,
}
pub struct SubstratIndex {
substrats: Vec<Substrat>,
}
impl SubstratIndex {
/// Find substrats for a query. Returns ALL substrats where the query
/// falls within their confidence zone, with membership strength.
/// A problem can belong to multiple substrats simultaneously.
fn find_substrats(&self, query_embedding: &[f32]) -> Vec<SubstratMatch> {
// For each substrat:
// distance = cosine_distance(query, centroid)
// effective_scope = scope * (1.0 + 0.3 * temperature)
// membership = exp(-distance² / (2 * effective_scope²)) // Gaussian
// Return all with membership > threshold, sorted by strength
}
/// A new substrat is born when a problem is far from ALL existing centroids
fn maybe_birth_substrat(&mut self, embedding: &[f32], neuron_id: NeuronId) -> Option<SubstratId>;
/// Existing substrats update their centroid as new examples arrive
/// (running average, weighted by success)
fn update_centroid(&mut self, id: SubstratId, new_embedding: &[f32], success: bool);
/// Decay temperature for all substrats (called periodically)
fn decay_temperatures(&mut self, elapsed: Duration);
}
On-device cost: one cosine comparison against ~N substrat centroids. With 100 substrats: ~100 dot products on a 384-dim vector — microseconds. No LLM needed for recognition.
Membership Function: Gaussian, Not Linear
The experiment showed separation ratios of 1.0-1.4 for related math types. A linear decay (1 - d/r) gives 50% membership at half the scope, which means edge-of-scope queries get strong signal — pulling in noise from adjacent substrats.
A Gaussian falloff is more selective and matches how biological place cells fire:
Linear: membership = max(0, 1.0 - distance / scope) // too permissive
Gaussian: membership = exp(-distance² / (2 * scope²)) // sharp center, soft edges
At distance = scope, Gaussian gives ~60% membership (one sigma). At distance = 2*scope, it gives ~13%. This means queries near the center get strong signal, edge cases get weak signal, and distant queries get near-zero — exactly what we need for closely-spaced math substrats.
Temperature: Conversational Context Routing
Substrats have a temperature that decays over time. Recently activated substrats are "warm" and easier to match:
fn effective_scope(&self) -> f32 {
self.scope * (1.0 + 0.3 * self.temperature)
}
If Sophie is doing math homework and asks "what about this one?", the math substrat is warm (recently activated) and captures the ambiguous query. Without temperature, the brain starts from scratch every time.
Temperature decays exponentially: temp *= 0.95 per minute. After 10 minutes of inactivity, a substrat is effectively cold. This is cheap (one float, one timestamp per substrat) and solves conversational context for free.
Scope vs Confidence: Separate Concepts
A naive design would use a single "radius" for everything. But that conflates two things:
- Scope — how broad the substrat covers in embedding space. "Mathematics" has large scope, "GCD" has small scope.
- Confidence — how reliable the substrat's knowledge is. "Mathematics" can be high-confidence AND large-scope. A brand new "Cooking" substrat is large-scope (uncertain about boundaries) AND low-confidence (few experiences).
These MUST be separate fields. Behavior differs:
- Large scope + high confidence: attracts queries confidently, provides strong exemplars
- Large scope + low confidence: accepts queries tentatively, marks them as exploratory
- Small scope + high confidence: narrow expert, very reliable within its domain
- Small scope + low confidence: shouldn't happen (would have been pruned or merged)
How substrats form and evolve:
- Birth — a query lands far from all centroids → new substrat born with scope=large (uncertain boundaries), confidence=low
- Growth — more queries land nearby → centroid shifts toward the mean, scope adjusts to cover members
- Maturation — after 10+ examples, the substrat has a stable center. Scope reflects actual coverage. Confidence grows with success rate.
- Splitting — if a substrat's members become bicoherent (two clusters), it splits (mitosis)
- Merging — if two substrats overlap >80%, they merge into one
This is neurogenesis — the brain literally grows new regions as it encounters new domains.
Stage B — Structural match within substrat (when needed)
Only compare fingerprints against trees in the matched substrat(s). Problems that live in multiple substrats get neurons from all of them (proportional to membership strength). This dramatically reduces false matches while preserving cross-domain knowledge.
pub struct SubstratMatch {
substrat_id: SubstratId,
membership: f32, // 0.0 to 1.0 — how strongly this query belongs here
}
pub enum RecognitionResult {
/// Known substrat(s), confident method → curate context + execute
Recognized {
primary: SubstratMatch,
secondary: Vec<SubstratMatch>, // other substrats that contribute
method_neuron: Option<NeuronId>,
confidence: f32,
},
/// Partially known — on the edge of a substrat's confidence zone
Familiar {
substrats: Vec<SubstratMatch>,
related_neurons: Vec<NeuronId>,
confidence: f32,
},
/// Completely novel — far from all substrats → new substrat born
Novel,
}
Layer 3: Working Memory / Attention (NEW — the real brain)
Once the substrat(s) are recognized, the thalamus curates what the executor needs:
pub struct WorkingMemory {
/// Primary substrat and membership strength
primary_substrat: SubstratMatch,
confidence: f32, // "I've solved 47/50 of these correctly"
/// Best 3 solved examples from this substrat (few-shot exemplars)
exemplars: Vec<StoredExample>,
/// Known wrong approaches (anti-patterns)
anti_patterns: Vec<AntiPattern>,
/// The generalized method, if one exists
method: Option<MethodNeuron>,
/// Secondary substrats that contribute (cross-domain knowledge)
/// e.g., "geometric sequences" pulls from both sequence + geometry substrats
secondary: Vec<SubstratMatch>,
}
The current prompt augmentation does a crude version of this (inject correct examples + anti-recall hints). This formalizes it as the primary mechanism.
Key design: WorkingMemory is small — 3 exemplars, 2 anti-patterns, 1 method. Like Miller's 7±2, the brain doesn't dump everything it knows into the prompt. It selects what's most relevant. This keeps prompts compact for small models. When multiple substrats match, exemplars are drawn proportionally from each.
Layer 4: Execution (three paths)
Path A — Cerebellum (compiled WASM, 0 LLM calls, <1ms)
Condition: method_neuron.stage == Cerebellum
When: procedure verified 100+ times, compiled to WASM
Cost: near-zero
Path B — Neocortex (parameterized code, 0 LLM calls, ~200ms)
Condition: method_neuron has parameterized code + parameters extracted
When: method verified 10+ times, code is deterministic
Cost: one Python subprocess
Path C — Guided LLM (1 LLM call with curated context)
Condition: everything else
When: novel problems, uncertain methods, no code
Cost: 1 local LLM call, but with perfect context
This IS the default path. No shame in it.
Path C is where most queries go early in the brain's life. Over time, paths B and A take over as methods mature. This is biological: a student starts by solving problems consciously (PFC + LLM), then builds intuition (neocortex), then automates (cerebellum).
Layer 5: Learning (richer signals)
After execution, three learning signals:
pub struct LearningSignal {
/// Did it work? (amygdala — reward)
reward: f32, // -1.0 to 1.0
/// Was I surprised? (prediction error)
surprise: f32, // |predicted_confidence - actual_outcome|
/// Which substrat(s) were activated? (hippocampal tag)
substrat_matches: Vec<SubstratMatch>,
/// What method was used? (procedural tag)
method_used: String,
/// Was code generated? Was it verified?
code_quality: CodeQuality,
}
Reward (already implemented) — correct/incorrect, updates confidence.
Surprise (new) — "I predicted 80% confidence but failed." This tells the brain its model of this substrat is wrong. High surprise → strong learning signal. Low surprise → small confirmation.
Tagging (new) — every experience gets labeled. This is the raw material for dream compression.
Space = Substrat
The Unification
The user-facing concept of Space and the brain's internal concept of Substrat are the same thing viewed from different angles:
| User sees | Brain sees |
|---|---|
| "Sophie's Math Homework" (a Space) | A substrat centered on math-homework embeddings |
| "Cooking" Space | A substrat centered on cooking-related embeddings |
| Spaces can overlap ("Chemistry" relates to both "Cooking" and "Health") | Substrats overlap when their confidence zones intersect |
| A Space grows as you use it | A substrat's centroid drifts, scope refines, confidence grows with experience |
The user says "create a Space for science fair." The brain creates a substrat. As Sophie works on the science fair, the substrat's centroid drifts toward whatever she's actually doing (maybe it's 60% biology, 30% presentation skills, 10% scheduling). The brain knows this from the embedding topology — no labels needed.
Why the Current Tree Model Is Wrong
The current Space model uses parent_id — a tree:
School
├── Math Homework
├── Science Fair
└── French Class
This forces artificial choices:
- Where does "Science Fair Presentation" go? Under "Science Fair" or under a hypothetical "Presentations" space?
- A child's "Reading" relates to both "School" and "Bedtime Routine" — but it can only have one parent.
- Reparenting a Space is a manual operation that breaks the hierarchy.
Brains don't organize knowledge in trees. They organize in overlapping, nested, interconnected regions.
The Substrat Graph Model
Replace the tree with an oriented graph where relationships emerge from geometry:
pub struct Substrat {
id: SubstratId,
/// How this substrat was created
origin: SubstratOrigin,
/// Center of this knowledge region (384-dim sentence embedding)
centroid: Vec<f32>,
/// Scope — how broad this substrat covers in embedding space
scope: f32,
/// Confidence — how reliable this substrat's knowledge is (0.0 to 1.0)
confidence: f32,
/// Temperature — recency-based activation warmth (0.0 to 1.0)
temperature: f32,
last_activated: Instant,
/// Neurons that belong here
neuron_ids: Vec<NeuronId>,
/// Explicit edges to other substrats (for non-geometric relationships)
edges: Vec<SubstratEdge>,
}
pub struct SubstratEdge {
target: SubstratId,
edge_type: EdgeType,
weight: f32, // strength, evolves with usage
}
pub enum EdgeType {
/// This substrat is geometrically inside another (auto-detected)
ContainedBy,
/// Two substrats' zones overlap (auto-detected from embedding proximity)
Overlaps,
/// User or brain explicitly linked them ("Cooking feeds into Nutrition")
Feeds,
/// Inhibitory: activating one suppresses the other
Inhibits,
}
How Relationships Emerge
Most edges are NOT declared — they emerge from geometry:
Containment: substrat A's centroid is within B's zone
AND A's scope < B's scope
→ A is inside B (like "GCD" inside "Number Theory")
Overlap: distance(A.centroid, B.centroid) < A.scope + B.scope
→ A and B share territory (like "Biochemistry" between "Bio" and "Chem")
Adjacency: distance ≈ A.scope + B.scope (barely touching)
→ related but distinct (like "Cooking" and "Nutrition")
Independence: distance >> A.scope + B.scope
→ no relationship (like "Cooking" and "Car Maintenance")
The brain computes these relationships during dream cycles — no user intervention needed. But the user CAN create explicit edges ("link Science Fair to Presentations") which add Feeds edges.
Organisms and Sub-Organisms
This maps exactly to the biological organism model:
Organism (Group-level brain)
├── Substrat "Mathematics" (scope: large, conf: 0.85, many neurons)
│ ├── Sub-substrat "GCD" (scope: tight, conf: 0.94, method neuron, mature)
│ ├── Sub-substrat "Quadratic" (scope: medium, conf: 0.63, growing)
│ └── Sub-substrat "Geometry" (scope: large, conf: 0.20, few neurons, young)
│ ↕ overlap with "Art" substrat
├── Substrat "Cooking" (scope: medium, conf: 0.40)
│ ↕ overlap with "Chemistry" and "Nutrition"
└── Substrat "Daily Routine" (scope: large, conf: 0.70, loose)
├── Sub-substrat "Morning" (tight)
└── Sub-substrat "Bedtime" (tight)
↕ overlap with "Reading"
Each substrat IS a sub-organism:
- It has its own neurons (memory)
- Its own method neurons (compiled knowledge)
- Its own anti-patterns (what doesn't work)
- Its own maturity level (scope = breadth, confidence = reliability)
- Its own dream cycle contribution (consolidation within the substrat)
The Group-level organism IS the collection of all substrats + their edges. Intelligence isn't in any single substrat — it's in the topology of the graph.
What Changes for the User
Nothing visible changes. Users still see "Spaces." They create, name, and navigate them. The difference:
| Before (tree) | After (substrat graph) |
|---|---|
| "Create Space" → choose a parent | "Create Space" → brain finds where it fits in the topology |
| Moving a Space = reparenting | Spaces drift naturally as usage evolves |
| One parent only | Multiple relationships (containment, overlap, feeds) |
| User organizes | Brain organizes, user overrides when needed |
| "Where does this go?" | "It goes where it naturally belongs" |
| Every query starts from scratch | Recently used Spaces are "warm" — better at catching ambiguous queries |
| Empty Spaces are dead weight | Empty Spaces have a name-bootstrapped centroid, ready to learn |
Example flow:
- User creates "Science Fair" Space
- Brain creates substrat with centroid from "science fair" embedding
- Sophie asks about "photosynthesis for the poster" → embedding lands in Science Fair zone
- She also asks about "how to make the poster look good" → embedding is between Science Fair and a hypothetical "Art" substrat
- Dream cycle detects overlap between Science Fair and Art → auto-creates edge
- Next time Sophie's in Art space and asks about visual layout, the brain pulls knowledge from both Art AND Science Fair substrats
The brain becomes a navigator of the user's knowledge topology, not a filing cabinet.
Lifecycle Duality: Explicit vs Emergent Substrats
User-created Spaces and brain-discovered substrats have different lifecycles. This must be handled explicitly:
Explicit substrats — user creates a Space ("Photography"). The brain creates a substrat with:
- Name from the user
- Centroid bootstrapped from the name's embedding ("photography" → 384-dim)
- Large scope (uncertain, user hasn't defined boundaries yet)
- Low confidence (no experiences yet)
- Persistent even if empty — the user wants this Space to exist
Emergent substrats — the brain discovers a cluster from usage. No name. No user intent. Examples:
- Sophie keeps asking about meal prep → a "meal prep" substrat forms naturally
- The brain can suggest: "You've been asking a lot about meal prep — should I create a Space for this?"
- If promoted, it becomes explicit (gets a name, becomes navigable in the UI)
- If not promoted, it stays as brain-internal knowledge organization
What happens on deletion:
- User deletes a Space → the substrat is archived, not destroyed
- Neurons, method neurons, and anti-patterns are preserved (the brain doesn't forget learned knowledge)
- The substrat becomes "dormant" — no longer matches queries, no longer navigable in UI
- Can be reactivated if the user creates a similar Space later
What happens with empty Spaces:
- A Space with no interactions has only a name-bootstrapped centroid — it's a placeholder
- As the user interacts, the centroid drifts toward actual usage patterns
- The name "Photography" might drift toward "Photo editing on iPhone" as that's what the user actually does
- The name doesn't change (it's the user's label), but the substrat's scope tightens around real usage
pub enum SubstratOrigin {
/// User-created Space — persists even if empty, name is fixed
Explicit { name: String, created_by: UserId },
/// Brain-discovered cluster — can be promoted or pruned
Emergent { suggested_name: Option<String>, discovered_at: DateTime },
/// Was explicit, user deleted — archived, knowledge preserved
Archived { original_name: String, archived_at: DateTime },
}
Migration Path
The parent_id field doesn't need to disappear immediately. It becomes one source of explicit edges:
// During migration: parent_id → ContainedBy edge
if let Some(parent) = space.parent_id {
substrat.edges.push(SubstratEdge {
target: parent.into(),
edge_type: EdgeType::ContainedBy,
weight: 1.0, // explicit user choice = strong
});
}
New spaces created after the migration use the substrat model natively. Old spaces with parent_id keep working through the edge translation.
The Dream Cycle
Compression IS intelligence
Storing 134 individual trees is memory. Compressing them into 10 methods is intelligence. The dream cycle's primary job is compression, not cleanup.
Phase 0: Substrat Formation (no LLM needed)
For every new neuron, compute its distance to existing substrat centroids. If it falls within a substrat's confidence zone → assign it there. If it's far from all substrats → birth a new substrat centered on this embedding.
New neuron with embedding E:
For each substrat S:
distance = cosine_distance(E, S.centroid)
membership = exp(-distance² / (2 * S.scope²))
if membership > threshold → assign to S, update S.centroid (running mean)
If no substrat matched → birth new substrat(centroid=E, scope=0.50, confidence=0.0)
Cost: zero LLM calls. Pure vector math. The brain discovers its own topology from the embedding space — no labels, no classification, no LLM dependency.
Optionally, the local LLM can name substrats for human readability ("this cluster seems to be about GCD problems"), but the brain doesn't need names to function. Names are UI, not intelligence.
This is the hippocampus placing new memories into its spatial map — the memory's location IS its meaning.
Phase 1: Cluster Refinement
Substrats naturally cluster neurons by embedding proximity. During dream, refine:
- Coherence check — compute internal agreement within each substrat. If most members got the same answer for similar inputs, the substrat is coherent and ready for method birth.
- Bicoherence detection — if a substrat has two internal clusters (bimodal), split it (mitosis). This is how "algebra" naturally splits into "linear algebra" and "quadratic" as the brain sees more examples.
- Overlap detection — if two substrats' confidence zones overlap >80%, merge them.
- Centroid recalculation — weighted by success (correct solutions pull the centroid more).
Phase 2: Birth (method neurons)
For each substrat with 5+ members and >60% internal agreement:
struct MethodNeuron {
substrat_id: SubstratId,
/// The generalized procedure
/// Option A: parameterized code (if 3+ members had working code)
/// Option B: solution template (natural language method description)
/// Option C: best exemplar's code (fallback)
procedure: Procedure,
/// 3 best solved examples (for few-shot when LLM is needed)
exemplars: Vec<StoredExample>,
/// Known failure modes (from incorrect members)
anti_patterns: Vec<String>,
/// Confidence = weighted accuracy across cluster members
confidence: f32,
/// Children: the experience neurons that birthed this method
children: Vec<NeuronId>,
}
The method neuron replaces the experience neurons for recall purposes. It matches any query that falls within the substrat's confidence zone (broad), not just queries similar to one specific past query (narrow).
For code generalization, try to extract parameterized code:
- Find the experience with the best code
- Ask the local LLM: "Rewrite this code as a function with parameters: [code]"
- Verify the parameterized version against 3 stored examples
- If it passes, store as the method's procedure
Cost: 1 LLM call per method birth + 3 Python verifications. Amortized across all future recalls within this substrat.
Phase 3: Prune
- Experience neurons covered by a method neuron → archive (keep 3 best as exemplars, delete rest)
- Method neurons with <30% accuracy after 10+ uses → demote, re-collect experiences
- Neurons not recalled in 30+ days with low confidence → delete
The brain gets leaner and faster over time, not just bigger.
After 100 math problems: maybe 134 → 15 method neurons + 45 exemplars. That's 60 trees instead of 134. Each method matches everything within its substrat's confidence zone.
Phase 4: Code Robustness (existing)
Re-execute all stored code. Boost working code, remove broken code.
Phase 5: Compile (myelination)
Method neurons with confidence > 0.95, verified 100+ times → compile to WASM.
This connects to the existing WASM extension SDK. The brain literally grows new extensions as it learns. The cerebellum IS the extension ecosystem.
Phase 6: Creative Recombination (REM sleep)
The most speculative but potentially most powerful phase.
Biological REM sleep creates novel combinations — "what if I combined the recipe method with the scheduling method?" Applied:
- Take method A from substrat X and method B from substrat Y
- Generate a hypothesis: "could method A's approach apply to substrat Y?"
- Test against stored examples from substrat Y
- If it works better than method B → new method born
This is how biological brains discover that the same algorithm applies to seemingly different domains (e.g., "shortest path" applies to both map navigation and network routing).
Cost: a few LLM calls per dream cycle. Only attempt when the brain is mature (50+ method neurons). This is optional and future-looking.
Wild Ideas From Nature
1. Immune System — Negative Selection
The immune system knows what NOT to react to (self-tolerance) as much as what to react to. Central tolerance deletes T-cells that attack self; peripheral tolerance suppresses remaining self-reactive cells.
Applied: Per-substrat anti-pattern libraries.
struct SubstratKnowledge {
/// What works
method: Procedure,
exemplars: Vec<Example>,
/// What DOESN'T work (equally valuable)
anti_patterns: Vec<AntiPattern>,
// "Don't try to expand the modulus first"
// "The naive area formula fails for inscribed polygons"
// "Brute force times out for n > 10000"
}
Anti-patterns are currently per-tree. Make them per-substrat. A substrat's failures are as valuable as its successes — they prevent the brain from repeating mistakes.
When providing context to the LLM (Path C), include 1-2 anti-patterns alongside exemplars. "Here's how to solve this, and here's what NOT to do."
2. Sleep Stages — Different Dream Phases
Real sleep has distinct stages with different cognitive functions:
| Sleep Stage | Biological Function | Brain Equivalent |
|---|---|---|
| N1/N2 (light) | Sort and tag recent memories | Phase 0: Label new neurons |
| N3 (deep) | Consolidate to long-term, prune weak | Phases 1-3: Cluster, birth, prune |
| REM | Creative recombination, test hypotheses | Phase 6: Cross-substrat method transfer |
Run light sleep frequently (every 5 minutes — tag new experiences). Run deep sleep less often (every hour — compress and prune). Run REM rarely (daily — creative exploration).
3. Critical Periods — Neuroplasticity Windows
Young brains have high plasticity (learn fast, unstable). Mature brains have low plasticity (stable, learn slowly). This is controlled by inhibitory neurons that gate plasticity.
Applied: Per-substrat learning rate.
First 10 problems in a substrat:
→ HIGH plasticity: store everything, accept contradictions, don't prune
→ Large scope (exploring boundaries), low confidence (few data points)
→ Centroid shifts easily with each new example
10-50 problems:
→ MEDIUM plasticity: start consolidating, birth methods, moderate pruning
→ Scope stabilizing, confidence growing with success rate
→ Centroid shift weighted by success (good results pull harder)
50+ problems:
→ LOW plasticity: only update on surprises, aggressive pruning, compile to WASM
→ Tight scope, high confidence, stable centroid, resists perturbation
→ Only high-surprise events (prediction errors) cause significant updates
This prevents a mature substrat from being destabilized by one outlier, while keeping new substrats open to rapid learning. Scope and confidence together encode maturity — new substrats have large scope + low confidence (uncertain), mature ones have refined scope + high confidence (expert).
4. Mirror Neurons — Learn by Watching the LLM
Mirror neurons fire both when performing an action AND when observing someone else perform it. They're the basis of imitation learning.
Applied: When the LLM solves a problem (Path C), the brain doesn't just store the answer. It observes the LLM's reasoning and extracts structure:
LLM output: "To find GCD(48,18), I'll use the Euclidean algorithm:
48 = 2×18 + 12, 18 = 1×12 + 6, 12 = 2×6 + 0. GCD = 6."
Brain observes:
substrat: closest to "gcd" centroid (distance: 0.08)
method: "euclidean_algorithm"
pattern: "repeated division with remainder"
parameters: {a: 48, b: 18}
answer: 6
The LLM already explains its reasoning (chain-of-thought). The brain should parse this to extract method labels and patterns. One extra LLM call: "What method did you just use? What are the parameters?"
Over time, the brain learns the LLM's vocabulary for methods. It can then provide context in the LLM's own language.
5. Embodied Cognition — The Knowledge Graph IS the Brain
Biological organisms use the environment as external memory. Ants leave pheromone trails. Humans write notes. The environment structures cognition.
Applied: The SubstratIndex should be navigable, not flat.
Substrat space (384-dim, visualized):
● "GCD cluster" (scope: 0.12, conf: 0.94, 47 neurons, method: euclidean)
● "LCM cluster" (scope: 0.15, conf: 0.80, 12 neurons, method: via_gcd)
↕ overlap zone — LCM/GCD share neurons (experiment: centroid distance 0.33)
● "Primality cluster" (scope: 0.18, conf: 0.80, 8 neurons, method: trial_division)
● "Quadratic cluster" (scope: 0.25, conf: 0.63, 5 neurons, method: quadratic_formula)
○ "Geometry cluster" (scope: 0.40, conf: 0.20, 3 neurons, no method yet — too young)
Navigation through this embedding space IS cognition. A query's embedding lands near a substrat centroid → the brain activates that region (and warms its temperature). Scope encodes breadth, confidence encodes expertise. Overlap zones naturally capture cross-domain problems.
Note from experiment: GCD and LCM centroids are only 0.33 apart, while their intra-type means are 0.45 and 0.52. In practice, they'd naturally merge into a single "divisibility" substrat — which is correct. The brain discovers that GCD and LCM are the same domain, even though humans label them separately.
This graph structure emerges naturally from the dream cycle's substrat formation + method births. No manual design needed.
6. Neurogenesis — The Brain Grows New Capabilities
Adult neurogenesis (new neuron birth) happens in the hippocampus throughout life. New neurons help distinguish between similar memories (pattern separation).
Applied: When the brain encounters a genuinely new domain (far from all substrat centroids), it should:
- Birth a new substrat (new region of the brain) centered on this query's embedding
- Set plasticity to HIGH (large scope, low confidence, critical period)
- Allocate attention to this domain (salience boost)
- After 10+ experiences, attempt first method birth
The brain literally grows new regions as it encounters new domains. A family AI that starts with math knowledge and encounters cooking for the first time grows a "cooking" substrat from scratch. No labels needed — the brain discovers the new domain purely from the embedding topology.
Critical Assessment
Can we actually build this?
Honest answer: yes, with constraints.
What's achievable (high confidence)
Substrat-based recognition — Pure embedding math. No LLM calls needed for recognition at all. A problem's embedding is compared to substrat centroids — if it falls within a confidence zone, the brain knows what kind of problem this is. Expected accuracy: 80%+ (embeddings from models like E5 or fastembed already cluster semantically similar problems naturally).
Prompt augmentation as primary mode — Already working at 59% accuracy. Formalizing it as WorkingMemory with structured exemplar selection from the best-matching substrat(s) will improve it further.
Dream compression — Substrats form automatically from embedding proximity. Method neuron births are algorithmic. Zero LLM calls needed for the entire recognition + compression pipeline. LLM is only used optionally for: naming substrats (cosmetic), parameterizing code (one-time), and guiding execution (the whispered advisor role).
Parameterized code extraction — Asking "rewrite this code as a function with parameters" is a simple LLM task. Even a 1.5B model can do this for math code. Verification is pure Python execution.
Anti-pattern tracking — Engineering work, no AI needed. Track what fails per substrat.
What's hard (medium confidence)
Code generation quality — 1.5B models generate valid Python 38% of the time. This limits how many method neurons can have executable code. Mitigation: store solution templates (natural language method descriptions) as the primary representation, code as an optimization.
Parameterized code reliability — Even if we extract parameterized code, verifying it across different parameter ranges is non-trivial. A function that works for gcd(48, 18) might fail for edge cases. Mitigation: verify against 5+ stored examples with different parameters.
Cross-substrat transfer (REM sleep) — Discovering that the same method applies to different domains requires genuine reasoning. Small models may not produce useful hypotheses. Mitigation: defer this to phase 4, only attempt when brain is mature.
What's fundamentally limited
Small model ceiling — A 1.5B model has a quality ceiling. No amount of context curation will make it solve problems that require 70B-level reasoning. The brain can't create intelligence that doesn't exist in the model.
But — the brain's goal isn't to exceed the model. It's to:
- Consistently reach the model's ceiling (via perfect context)
- Handle known problems without the model (via compiled procedures)
- Accumulate knowledge that transfers to better models later
When the user upgrades from 1.5B to 7B, all the brain's compressed knowledge (substrats, methods, anti-patterns, WASM modules) immediately makes the 7B model perform like a domain-expert 70B.
This is the product insight: the brain is an amplifier. A small model + rich brain outperforms a big model + no brain. And the brain is portable, private, and grows with the user.
The honest risk
The risk is: we build infrastructure that doesn't produce measurably better results. The current brain has 21 files, 7,500 lines, 167 tests — and saves 7% of LLM calls with 16.7% recall accuracy.
The mitigation is: measure relentlessly, ship incrementally, cut what doesn't work.
- Substrat formation should show improvement in 1 week (measurable: recall accuracy goes from 10% to 60%+)
- Method neurons should show improvement in 2 weeks (measurable: LLM savings go from 7% to 30%+)
- If either doesn't move the needle, stop and reconsider
The vision document (digital-brain-vision.md) is architecturally sound. But the implementation must be metrics-driven. Every feature earns its place by moving the numbers. If NeuronStages don't improve accuracy, cut them. If prediction tracking doesn't improve learning speed, cut it.
What I think about the vision
The vision is right about the destination: a brain that learns, compresses, and eventually runs autonomously. The architecture — neuron maturation, dream-driven compression, recursive decomposition, myelination — maps biological mechanisms to concrete implementations.
But it underestimates the foundation problem. The current structural matching doesn't reliably recognize the same problem type. Code substitution doesn't work. These aren't features to add later — they're prerequisites for everything else. Method neurons can't be born from clusters that don't form because matching doesn't work.
The revised architecture (brain as attention system, substrat-based recognition) fixes the foundation. The vision doc's higher-level features (stages, decomposition, prediction) build on top once the foundation works.
The sequence matters: foundation first, then intelligence, then optimization.
What Works, What Doesn't, What's Missing
Keep (working and valuable)
| Component | Why it works |
|---|---|
| TrajectorySegmenter | Structural decomposition is genuinely better than flat cosine |
| SparseFingerprint + FingerprintIndex | O(1) candidate retrieval, fast |
| NeuronStore (3 implementations) | Solid persistence layer |
| Prompt augmentation | 59% accuracy vs 24% baseline — the brain's best feature |
| Self-verification | 100% of stored code verified. No broken code persists |
| Anti-recall / quarantine | Prevents replaying wrong answers |
| Confidence tracking | Per-neuron quality signals with temporal decay |
| Dream consolidation framework | The 5-phase cycle structure is right |
| Feature gating | Clean separation, no regressions |
Fix (broken but fixable)
| Component | Problem | Fix |
|---|---|---|
| Code substitution | Empty {} — can't identify parameters | Parameterized code + LLM-extracted parameter names |
| Structural matching thresholds | Same problem type doesn't match (10% recall) | Substrat-based matching as primary, structural as secondary |
| Code-first default | 1.5B model can't code 62% of the time | Make strategy model-aware. Code for 7B+, templates for 1.5B |
| Dual-path verification | Costs more LLM calls than it saves | Only verify when surprise is high (prediction error) |
| Dream merging | Merges by fingerprint, not semantics | Cluster by substrat proximity instead |
Reuse (existing infrastructure that maps to substrats)
| Component | Action | Why |
|---|---|---|
| SignalGraphExecutor | REUSE heavily | Core propagation engine (BFS, safety bounds, trace recording) maps directly to substrat graph traversal. Add substrat-aware routing and multi-step credit assignment in learn(). |
| SpaceOrganismRegistry | ADAPT | Collection/registry pattern is sound. Upgrade cross-space edges to graph-aware edges. Add coherence tracking across substrats. |
| Grammar (TextGrammar) | REUSE + REFACTOR | Working tokenization impl. Move into encoding pipeline: Grammar.tokenize() → EmbeddingModel.embed() → sentence vector. SpaceOrganism should call this, not inline tokenization. |
Pause (consumers, not providers)
| Component | Why pause |
|---|---|
| LlmOrganism / WasmOrganism | These are inference consumers, not memory providers. They belong in V2.1 Knowledge Marketplace as "specialist nodes," not in the core substrat graph. The core graph is memory + recall + learning. LLMs/WASM are inference + tooling. |
| gRPC proto definitions | Federation before local intelligence works |
| SubstratEncoder | Abstraction layer with one implementation — but the Grammar/Embedding pipeline above replaces its role |
These files stay feature-gated. When the intelligence works and needs deployment infrastructure, they're there.
Build (missing and essential)
| Component | Why essential | Effort |
|---|---|---|
| Substrat + SubstratIndex | Fix the 10% recall rate. 97% nearest-centroid accuracy proven. Zero LLM cost. Includes scope/confidence/temperature. | ~150 lines |
| SubstratOrigin | Explicit (user Space) vs Emergent (brain-discovered) vs Archived lifecycle | ~30 lines |
| WorkingMemory | Formalize prompt augmentation as primary mechanism. Draw from multiple substrats proportionally. | ~80 lines |
| Substrat formation (dream Phase 0) | Automatic clustering from embedding topology, Gaussian membership, no LLM | ~60 lines |
| Method neuron births (dream Phase 2) | Compression — 134 trees → 15 methods | ~150 lines |
| Parameterized code extraction | Make code substitution actually work | ~100 lines |
| Prediction tracking | Richer learning signal than binary reward | ~60 lines |
| NeuronStage enum | Maturation tracking (hippocampus/neocortex/cerebellum) | ~20 lines |
| SignalGraphExecutor adaptation | Add substrat-aware routing + multi-step credit assignment to existing engine | ~40 lines |
Implementation Priorities
Week 1: Fix the Foundation
Goal: Substrat-based recognition. Recall accuracy 10% → 60%+.
- Build
Substratstruct — centroid embedding + scope + confidence + temperature + neuron IDs - Build
SubstratIndex— substrat formation, matching, centroid updates - Dream Phase 0: automatic substrat formation from embedding proximity (zero LLM calls)
- Recognition: substrat match first, structural match within substrat
- Keep prompt augmentation as the primary execution path
Measure: Run benchmark. Count substrat recognition rate. Target: 60%+ of re-seen problems fall within correct substrat's confidence zone. Experiment already showed 97% nearest-centroid accuracy on sentence embeddings — this target should be achievable. Key advantage: zero LLM dependency for recognition.
Week 2: Method Neuron Births
Goal: Dream compression. 134 trees → ~15 method neurons. LLM savings 7% → 30%+.
- Dream clustering by substrat membership (already done in Phase 0)
- Method neuron birth from substrats with 5+ members
- Parameterized code extraction (LLM + verification)
- WorkingMemory struct: 3 exemplars + 2 anti-patterns + 1 method
- Experience neuron archival (method replaces its children for recall)
Measure: Run benchmark. Count method neurons born. Count LLM calls saved by method recall. Target: 30% savings on known substrats.
Week 3: Prediction + Maturation
Goal: Richer learning, neuron lifecycle. Learning speed improvement.
- NeuronStage enum (Hippocampus / Neocortex / Cerebellum)
- Prediction tracking: before execution, predict confidence. After, compute surprise.
- Surprise-weighted learning (high surprise → strong update)
- Per-substrat plasticity (critical periods)
- Anti-pattern tracking per substrat
Measure: Run benchmark with 200 problems. Compare learning curve (accuracy at problem 50, 100, 150, 200). Target: steeper curve than current.
Week 4: Compilation + Polish
Goal: WASM compilation for mature methods. Full pipeline.
- Myelination: methods with confidence >0.95 → compile to WASM
- Promote from bench-cli to morphee-core (the intelligence should be in core, not consumers)
- Wire into morphee-server
- Salience-weighted search (context + recency + confidence)
Measure: End-to-end benchmark. 200 problems, 4 phases. Target: 50% LLM savings, 40%+ accuracy on known substrats with 0 LLM calls.
Future: Recursive Decomposition + Federation
Only after weeks 1-4 produce measurable improvement:
- Recursive decomposition for novel problems ("break this into sub-problems I know")
- REM sleep (cross-substrat method transfer)
- Federated brain (share method neurons across instances)
- Mirror neuron learning (observe LLM chain-of-thought, extract method structure)
Open Questions
1. What's the right initial scope for new substrats?
This controls how eagerly the brain assigns new problems to existing substrats vs. birthing new ones. Too large → everything lumps together ("math"). Too small → every problem is its own substrat.
Experiment data suggests: Intra-type distances average 0.53, inter-type distances average 0.66. Initial scope of ~0.45-0.50 would capture most same-type problems while excluding different types. But the Gaussian membership function gives soft edges, so the exact value matters less than with linear cutoffs.
Likely approach: start with scope = 0.50 (Gaussian sigma). Tunable via benchmark. The dream cycle corrects mistakes anyway (split/merge).
2. How many substrats emerge from real usage?
The experiment showed 10 distinct math types cluster well with sentence embeddings. But GCD and LCM naturally merge (centroid distance 0.33 < intra-type distance 0.45). So 10 labeled types → ~7-8 substrats in practice.
A family AI might have 50-100 substrats. SubstratIndex is O(N) in substrat count for matching (N cosine comparisons). With 100 substrats, that's 100 dot products — microseconds. With 10,000 substrats, an ANN index (ball tree or VP-tree) keeps it O(log N).
3. How does granularity self-regulate?
The dream cycle handles this naturally:
- New substrat starts with a large scope (uncertain, inclusive) + low confidence
- More examples → centroid stabilizes, scope adjusts to actual coverage
- If members diverge (bicoherent) → the substrat splits (mitosis)
- If two substrats overlap too much → they merge
Experiment insight: GCD and LCM are likely to START as one merged substrat and may NEVER split, because their embeddings are genuinely close (distance 0.33). This is actually correct — they use the same mathematical machinery (divisibility). The brain discovers that these are one domain even if humans label them separately.
4. Should method neurons store code or templates?
For domains where code works (math, data processing): parameterized code. For domains where code doesn't apply (writing, planning, conversation): solution templates (structured natural language methods).
Both should be supported. The Procedure type should be an enum.
5. When does the brain stop needing the LLM?
Asymptotically, but never fully. Even a mature brain encounters novel problems. The goal is:
- Day 1: 100% of queries need LLM
- Day 30: 50% of queries need LLM (known substrats handled by methods)
- Day 100: 20% of queries need LLM (most substrats covered)
- Day 365: 5% of queries need LLM (rare novelty + verification)
The LLM transitions from "primary solver" to "teacher for new domains" to "occasional consultant."
Metrics That Matter
Primary metrics (measure every benchmark)
| Metric | Current | Week 1 Target | Week 4 Target |
|---|---|---|---|
| Substrat recognition rate | ~10% (structural) | 60%+ | 80%+ |
| LLM call savings | 7% | 15% | 50%+ |
| Recall accuracy (brain alone) | 16.7% | 40%+ | 70%+ |
| Overall accuracy | 54% (re-seen) | 60% | 70%+ |
| Tree count (lower = more compressed) | 134 (150 problems) | 100 | 50 |
| Substrats formed | 0 | 8+ | 15+ |
| Method neurons born | 0 | 5+ | 15+ |
Secondary metrics (track but don't optimize for)
| Metric | Purpose |
|---|---|
| Recall latency | Brain shouldn't add >100ms |
| Dream cycle duration | Should complete in <30s |
| Code rate | % of method neurons with executable code |
| Anti-pattern count per substrat | Richer = fewer repeated mistakes |
| Surprise calibration | How well does predicted confidence match actual success rate |
The North Star
Near-term (provable now):
A small local model + a mature brain should dramatically outperform the same small model with no brain.
1.5B + brain (100 trained problems) vs 1.5B alone
Current: 54% vs 22% on re-seen problems. Target: 80% vs 22%. This is the achievable, measurable proof that the brain adds value.
Long-term (aspirational):
A small local model + a mature brain should match or outperform a large cloud model with no brain, on domains the brain has experience in.
1.5B + brain (1000 trained problems) vs 70B + no training
A 70B model will still crush a 1.5B on novel reasoning. But on domains where the brain has compiled methods, parameterized code, and anti-patterns — the brain-augmented small model should match or exceed it. This is the product thesis: the brain is a knowledge amplifier. And unlike the 70B, it's private, local, and the user owns their intelligence.
The near-term benchmark proves the mechanism works. The long-term benchmark proves the product vision.