Skip to main content

Fractal Brain — Digital Organism Architecture

Status: Implemented (Phases 1-6 + Prediction + Telemetry). Actively benchmarked via Kaggle AIMO competition. Feature gate: fractal-brain (brain modules), grpc (proto/tonic) ADR: Follows from ADR-012 (RL Policy Network) Location: crates/morphee-core/src/brain/ (26 files, ~10,000 lines, 233 tests) Benchmarking: bench/ (CLI + Dashboard + Docker + Remote Runners) — 284 tests Telemetry: brain-telemetry.md — SQLite + Prometheus + CLI + Dashboard

Competition context: The Fractal Brain is being demonstrated through the Kaggle AIMO math competition. Math is the perfect domain — it requires abstraction, pattern recognition, and compositional reasoning. Success here proves the brain can handle day-to-day tasks for families, teams, and professionals. Learned knowledge will be shareable through the Knowledge Marketplace (V2.1).

See also: Digital Brain Vision — the next evolution. Brain Critical Analysis — honest assessment of strengths and gaps. Brain Telemetry — measurement infrastructure.

Overview

The Fractal Brain implements Universal Recursive Intelligence — everything is an Organism (neurons, spaces, groups, LLMs, WASM modules). Same trait, same lifecycle, every scale. The core pattern: receive signal → recall → respond → learn.

Six scales of organization, each implementing the same Organism trait:

ScaleExampleRecallLearn
NeuronSingle concept/factFingerprint matchHebbian weight update
ExperienceLLM call, WASM moduleDirect delegationReward signal
SpaceFamily, Classroom, ProjectNeuronMemory (3-mode)Edge weight update + neuron storage
GroupThe Dupont FamilyRoute to best SpaceCross-space edge learning
InstanceDesktop/mobile appLocal organism graphSync with server
NetworkSpecialist neuronsgRPC signal propagationFederated learning

Phase 1-3: Neuron-based Knowledge Representation

Problem

morphee-core's knowledge pipeline used flat 384-dim embedding vectors for recall (recall_similar via cosine similarity). This loses structural information:

  • "Find the GCD of 48 and 18" and "Find the GCD of 360 and 240" appear "similar" but there's no way to know the operation matches and only the arguments differ
  • Flat cosine can't distinguish structural similarity from surface similarity
  • Every near-miss requires a full LLM call even when only parameter substitution is needed

Solution: Neurons + Synapses + Trees

The Fractal Brain treats each embedding vector as a neuron (a point in 384-dim space) and connections between them as synapses (weighted by influence). A single BERT forward pass produces per-token hidden states; trajectory segmentation recursively decomposes them into a tree.

Three Recall Modes

ModeConditionActionLLM Calls
ExactAll neurons match (root >0.95, children >0.90)Replay stored solution code0
VariationOperation neurons match, leaf neurons differSubstitute parameters in stored code0
NovelNo structural matchFull LLM call, store new tree1+

Recall Architecture

Query text

embed_tokens() → per-token hidden states (384-dim each)

TrajectorySegmenter → NeuronTree (recursive fractal structure)

FingerprintIndex.find_similar() → candidate trees (O(1) lookup)

compare_trees() → TreeMatch { Exact | Variation | Novel }

NeuronRecallStrategy:
Exact → replay stored code
Variation → string-replace stored labels → execute
Novel → fallback strategy → store new tree

Trajectory Segmentation Algorithm

  1. Compute running mean of hidden states at each token position
  2. Compute deltas (how the running mean changes per token)
  3. Compute cosine between consecutive deltas
  4. Adaptive threshold: mean(cosines) - 1.0 * std(cosines)
  5. Split at points where cosine drops below threshold (direction change)
  6. Recurse on each segment until min_segment_len or max_depth
  7. Synapse weight = ||child_mean - parent_mean|| * token_count / total_influence

Sparse Fingerprints

For O(1) approximate matching, each neuron stores a SparseFingerprint:

  • Top-32 most active dimensions (by absolute value) + their signs
  • Jaccard-like similarity (dimension overlap × sign agreement)
  • Hash-based FingerprintIndex for bucket-based candidate retrieval

Hebbian Plasticity

Synapse weights are updated by RL reward signals:

  • Positive reward → strengthen connections (weight increases)
  • Negative reward → weaken connections (weight decreases)
  • Weights clamped to [0.01, 1.0]

Neuron Merging

Two neurons with fingerprint similarity >0.92 can merge:

  • Weighted average of activation vectors (by strength)
  • Combined strength count
  • Creates "concept neurons" that generalize across experiences

Phase 4: Organism Architecture

Organism Trait

The universal contract that everything implements:

pub trait Organism: Send + Sync {
fn id(&self) -> &OrganismId;
fn scale(&self) -> Scale;
fn health(&self) -> Health;
async fn receive(&self, signal: &BrainSignal, ctx: &SignalContext) -> Result<Vec<BrainSignal>>;
async fn learn(&self, signal_id: &str, reward: f64) -> Result<()>;
}

BrainSignal

Signals carry data between organisms with modality awareness:

  • Modality: Text, Audio, Scalar, Image, Structured, Event
  • Activation: substrat_id + embedding vector
  • SignalContext: source organism, depth, budget, timestamp

Edge System

Directed weighted edges connect organisms:

  • EdgeKind: Temporal, Hierarchical, Associative, CrossSubstrat, CrossOrganism
  • AdaptiveFilter: learned gate on each edge (pass_rate updated by reward)
  • Weight range: [0.0, 1.0], Hebbian update on signal flow

SubstratEncoder

Abstraction for different embedding modalities:

  • BertSubstrat wraps the existing Embedder trait
  • TokenState tracks per-token activation history
  • Designed for multi-modal: text, audio, image substrats share the same trait

Grammar System

Tokenization/detokenization per modality:

  • TextGrammar wraps BertTokenizer
  • AudioGrammar stub for future audio signal processing
  • Token with TokenKind (Word, Subword, Punctuation, Special, AudioFrame, ScalarValue)

Phase 5: Signal Propagation & Execution

SignalGraphExecutor

The engine that propagates signals through the organism graph:

pub struct SignalGraphExecutor {
organisms: Arc<RwLock<HashMap<OrganismId, Arc<RwLock<dyn Organism>>>>>,
config: ExecutorConfig, // max_depth=5, budget_ms=1000, max_fanout=8
}

Safety bounds:

  • max_depth=5 — prevents infinite recursion
  • budget_ms=1000 — checked at each recursion via Instant::elapsed()
  • max_fanout=8 — limits edges explored per organism

Propagation loop:

  1. Deliver signal to target organism via receive()
  2. For each response signal, check outgoing edges from target
  3. For each edge where AdaptiveFilter.pass_rate > threshold, recurse
  4. Guard: depth < max_depth AND elapsed < budget_ms AND fanout < max_fanout
  5. Record SignalTrace (path of organism hops with edge weights)

SignalTrace

Records the path a signal took through the graph. When learn() is called with a reward, the trace tells which edges to update:

pub struct SignalTrace {
pub signal_id: String,
pub path: Vec<(OrganismId, OrganismId, f32)>, // (from, to, edge_weight)
pub timestamp: u64,
}

SpaceOrganism

First production impl Organism. Each Space becomes an independent learning organism:

pub struct SpaceOrganism {
id: OrganismId,
space_id: String,
group_id: String,
neuron_memory: Arc<NeuronMemory>,
edges: Vec<Edge>,
substrats: Vec<Arc<Substrat>>,
centroid: Option<Activation>,
child_organisms: Vec<OrganismId>,
signal_traces: Arc<Mutex<VecDeque<SignalTrace>>>,
pipeline_fallback: Option<Arc<Pipeline>>,
}

receive() flow:

  1. Grammar: decode signal modality → tokens
  2. SubstratEncoder: encode tokens → activations
  3. NeuronMemory::recall() → Exact/Variation/Novel
  4. Exact → construct response from stored source_text (0 LLM)
  5. Variation → apply parameter substitutions (0 LLM)
  6. Novel → delegate to LLM child organism or Pipeline fallback
  7. Store new NeuronTree if Novel
  8. Return response signals

learn() flow:

  1. Find SignalTrace for signal_id
  2. Update edge weights along path (Hebbian: reward strengthens, punishment weakens)
  3. Update NeuronTree strength in NeuronStore
  4. Does NOT propagate to other spaces (independence)

LlmOrganism

Wraps Arc<dyn Inferencer> as Organism at Scale::Experience. Receives text signals, calls inferencer.generate(), returns response signal. learn() is a no-op but logs for RL policy.

WasmOrganism

Wraps Arc<dyn Executor> as Organism at Scale::Experience. Maps signal → executor input, executor output → response signal.


Phase 6: Reward System & Confidence Tracking

Reward Architecture (reward.rs, 564 lines, 14 tests)

The reward system tracks confidence at the neuron/tree level:

  • Confidence tracking: Per-tree confidence based on reward history (uses/correct/incorrect counts)
  • Quarantine: Trees with reject rate >60% and 2+ uses are quarantined (excluded from recall)
  • Branch-level blame: BranchBlame distributes reward across tree children proportionally

Key types:

  • RewardEvent — timestamped reward with match_type and context_hash
  • TreeRewardLedger — per-tree tracking (events, total_uses, total_correct, confidence, quarantined)
  • BranchBlame — branch-level blame attribution (child_id → correct/total)
  • QuarantineConfig — thresholds (min_uses=2, reject_rate=0.6)

Substrat Index — Problem Type Recognition

SubstratIndex (substrat_index.rs, 952 lines, 36 tests)

The substrat index provides problem type recognition before recall. Each substrat is a Gaussian cluster in 384-dim sentence embedding space. When a new query arrives, it's classified as Recognized/Familiar/Novel at the substrat level before tree-level recall begins.

SubstratCluster

pub struct SubstratCluster {
id: SubstratId,
centroid: Vec<f32>, // mean embedding
scope: f32, // sigma (spread)
temperature: f32, // plasticity control
confidence: f32, // learned reliability
exemplar_tree_ids: Vec<String>,
origin: SubstratOrigin, // Explicit / Emergent / Archived
}

Membership function: exp(-d² / (2σ²)) where σ = scope × (1.0 + 0.3 × temperature)

Constants:

  • MEMBERSHIP_THRESHOLD = 0.3
  • NOVEL_DISTANCE_THRESHOLD = 0.80
  • TEMPERATURE_DECAY = 0.95
  • METHOD_NEURON_THRESHOLD = 5 (exemplars to birth method neuron)

RecognitionResult

ResultMeaningAction
RecognizedHigh membership in existing substratRecall within that substrat's trees
FamiliarModerate membershipBroader recall + centroid update
NovelNo substrat matchFull LLM call, potentially create new substrat

WorkingMemory

Transient context during recall: candidate substrats + candidate trees. Scoped to a single query, discarded after.


Method Neurons — Learned Procedures

MethodNeuron (method_neuron.rs, 254 lines, 10 tests)

Method neurons encapsulate learned procedures — code templates, solution patterns, or best exemplars that have proven reliable. They mature through three stages inspired by neuroscience:

StagePlasticityConfidenceBehavior
HippocampusHighLowNew, actively learning, every execution verified
NeocortexModerateMediumVerified, dual-path check (execute + verify)
CerebellumLowHighAutomated, no LLM verification needed

Promotion thresholds:

  • Hippocampus → Neocortex: 10+ uses, 70%+ confidence
  • Neocortex → Cerebellum: 50+ uses, 90%+ confidence

Procedure types:

  • ParameterizedCode — code template with parameter slots
  • SolutionTemplate — structured solution pattern
  • BestExemplar — reference to highest-confidence exemplar tree

Prediction System — Surprise-Driven Learning

PredictionTracker (prediction.rs, 257 lines, 10 tests)

The prediction system enables surprise-driven learning. Before execution, the brain predicts its confidence. After execution, the actual outcome is compared. High surprise (|predicted - actual| > 0.3) triggers stronger centroid updates in the substrat.

pub struct Prediction {
substrat_id: SubstratId,
predicted_confidence: f32, // before execution
actual_outcome: Option<f32>, // after execution
surprise: f32, // |predicted - actual|
}

This makes the brain learn faster from unexpected results — both surprising failures and surprising successes drive more aggressive substrat reorganization.


Dream Consolidation

DreamConsolidator (dream.rs, 473 lines, 10 tests)

9-phase background consolidation cycle (phases 7-9 require fractal-brain):

  1. Neuron merging — merge neurons with fingerprint similarity >0.92
  2. Synapse pruning — remove synapses with weight < 0.05
  3. Weak tree deletion — delete trees with 0% correct + quarantined
  4. Rehabilitation — reset quarantine for old trees
  5. Event pruning — keep only 100 most recent events per tree
  6. Branch pruning — remove worst-performing children
  7. Mitosis detection — detect low-coherence clusters → split
  8. Substrat clustering — assign trees to substrat clusters
  9. Method neuron birth — 5+ exemplars in a substrat → birth method neuron

DreamScheduler (dream_scheduler.rs, 216 lines, 4 tests)

Background timer for dream cycles. Default interval: 5 minutes (configurable via MORPHEE_DREAM_INTERVAL_SECS).

Lifecycle (lifecycle.rs, 280 lines, 6 tests)

  • MitosisDetector — monitors organism coherence (threshold: 200 neurons, 0.4 coherence, 10 min cluster size), triggers split
  • DecayPolicy — configurable decay for edges and synapses

Telemetry Infrastructure

BrainTelemetry (telemetry.rs, 364 lines, 6 tests)

Every brain decision is captured as structured telemetry:

pub struct BrainTelemetry {
substrat_id: Option<String>,
recognition_result: String, // recognized/familiar/novel
tree_id: Option<String>,
recall_type: String, // exact/variation/novel
execution_path: String, // neuron_exact/neuron_variation/knowledge/llm
llm_calls: u32,
working_memory_size: u32,
candidate_count: u32,
predicted_confidence: Option<f32>,
surprise: Option<f32>,
timing: BrainTiming, // recognition_ms, recall_ms, total_brain_ms
}

Three-layer measurement: SQLite (persistent, per-problem decisions), Prometheus (live gauges, 11 metrics), CLI + Dashboard (analysis and visualization). See brain-telemetry.md.


Multi-Space Management

SpaceOrganismRegistry (space_registry.rs, 380 lines, 11 tests)

Manages multiple independent SpaceOrganisms per group:

pub struct SpaceOrganismRegistry {
organisms: HashMap<OrganismId, Arc<RwLock<SpaceOrganism>>>,
executor: Arc<SignalGraphExecutor>,
dream_handles: HashMap<OrganismId, DreamHandle>,
}
  • create_space() — creates a new SpaceOrganism with its own NeuronStore
  • send_signal() — routes signal to the correct space
  • add_cross_space_edge() — connects spaces via EdgeKind::CrossOrganism
  • export_space() / import_space() — bundle spaces for marketplace sharing

Cross-Space Edges

Spaces connect via EdgeKind::CrossOrganism edges:

  • A "Math" space has an edge to a "Calculator" space
  • When Math gets a Novel signal, it propagates through the edge to Calculator
  • Calculator processes and returns response signals
  • Math's edge weight to Calculator is updated by learn() — independent of Calculator's internal state

NeuronStore Implementations

NeuronStore trait (store.rs, 459 lines, 11 tests)

pub trait NeuronStore: Send + Sync {
async fn store_tree(&self, tree: &NeuronTree) -> Result<()>;
async fn get_tree(&self, id: &str) -> Result<Option<NeuronTree>>;
async fn find_similar(&self, fingerprint: &SparseFingerprint, limit: usize) -> Result<Vec<NeuronTree>>;
async fn list_trees(&self) -> Result<Vec<NeuronTree>>;
async fn delete_tree(&self, id: &str) -> Result<()>;
}

Three implementations:

  • InMemoryNeuronStore — for testing, HashMap-backed
  • FileNeuronStore — persistent, JSON files in {data_dir}/neurons/{space_id}/
  • SqliteNeuronStore — persistent, SQLite database with FTS for fingerprint search

Per-space isolation is handled at construction time — each SpaceOrganism gets its own store instance.


gRPC Proto Definitions

organism.proto (212 lines)

7 RPCs for organism communication:

RPCDirectionPurpose
Sendserver streamingSignal propagation through organism graph
LearnunaryReward feedback to organism
Observeserver streamingLive signal stream for frontend visualization
GetOrganismunaryInspect organism state
ListOrganismsunaryEnumerate organisms
TriggerDreamunaryOn-demand consolidation
Chatserver streamingText chat (replaces SSE /v1/chat)

Proto ↔ Rust conversions in proto_convert.rs (408 lines, 8 tests), gated behind #[cfg(feature = "grpc")].


Pipeline → Organism Mapping

Pipeline componentOrganism equivalentHow
Embedder.embed()SubstratEncoder.encode()BertSubstrat wraps Embedder
Router.route()Edge weights + AdaptiveFilterSignal follows strongest edges
Strategy.process()Recursive receive()Signal propagation through depth
Executor.execute()WasmOrganism childWASM module as organism
Inferencer.generate()LlmOrganism childLLM as organism
Scorer.score()Implicit in edge weightsLearned, not computed
FeedbackLooplearn() + Hebbian edgesReward propagates through trace
MiddlewareChainAdaptiveFilter on edgesFilters learn what to pass
EventBusSignalPropagated eventsObserver of signal flow

Events

Brain-related events emitted to EventBus:

EventSourcePurpose
NeuronTreeBuiltrecallNew neuron tree stored
NeuronRecalledrecallExisting tree matched (Exact/Variation)
OrganismEdgeFormedspaceNew edge between organisms
OrganismMitosislifecycleOrganism split detected
OrganismFusionlifecycleOrganisms merged
DreamCycleCompleteddreamBackground consolidation finished
SignalPropagatedexecutorSignal hop recorded

Strategy Chain (bench-cli)

DirectToolStrategy (pattern match → tools, fastest)
→ NeuronRecallStrategy (fractal-brain, 3-tier recognition)
→ KnowledgeRecallStrategy (flat cosine, experience store)
→ AdaptiveStrategy (teaching → solver_first → code_execution → single_shot)

Three-Tier Recognition Stack (NeuronRecallStrategy)

  1. Substrat Recognition — sentence-level Gaussian clustering. Classifies the problem TYPE (e.g., "number theory / GCD"). Narrows the search space.
  2. Tree Recall — fingerprint-based matching within the recognized substrat. Finds structurally similar past experiences. Returns Exact/Variation/Novel.
  3. Method Neuron Execution — if a mature method neuron exists for this substrat, use its procedure directly (parameterized code or solution template).

Each tier short-circuits: if substrat recognition finds a Cerebellum-stage method neuron, zero LLM calls. If tree recall hits Exact, zero LLM calls. Only Novel queries fall through to LLM.


Key Files

Brain module files (26 files, ~10,000 lines, 233 tests)

FileTestsPurpose
mod.rsModule registration + feature gates
neuron.rs11Neuron, Synapse, NeuronTree, Hebbian update, merge
fingerprint.rs7SparseFingerprint, FingerprintIndex
segmenter.rs9TrajectorySegmenter (recursive tree builder)
store.rs19NeuronStore trait + InMemory + File + Git
recall.rs17TreeMatch, compare_trees, NeuronMemory
reward.rs14TreeRewardLedger, quarantine, BranchBlame
dream.rs10DreamConsolidator (9-phase cycle)
substrat_index.rs36SubstratCluster, SubstratIndex, RecognitionResult, WorkingMemory
method_neuron.rs10MethodNeuron, NeuronStage, Procedure
prediction.rs10PredictionTracker, surprise-driven learning
telemetry.rs6BrainTelemetry, structured decision capture
organism.rs3Universal Organism trait, Scale, Health
signal.rs5BrainSignal, Activation, Modality
edge.rs7Edge, EdgeKind, AdaptiveFilter
substrat.rs4SubstratEncoder, BertSubstrat
grammar/mod.rs2Grammar trait, Token types
grammar/text.rs3TextGrammar
executor.rs12SignalGraphExecutor, SignalTrace
space_organism.rs16SpaceOrganism (impl Organism)
lifecycle.rs6MitosisDetector, DecayPolicy
llm_organism.rs6LlmOrganism
wasm_organism.rs4WasmOrganism
space_registry.rs11SpaceOrganismRegistry
dream_scheduler.rs4DreamScheduler
proto_convert.rs8Proto ↔ Rust conversions

Proto definitions

FileLinesPurpose
proto/organism.proto212gRPC schema (7 RPCs)

Modified files

FileChange
Cargo.tomlfractal-brain = [] + grpc features
lib.rs#[cfg(feature = "fractal-brain")] pub mod brain;
traits/embedder.rsTokenActivation + embed_tokens()
providers/embeddings.rsembed_tokens_sync() in candle_impl
providers/candle_embedder.rsOverride embed_tokens()
providers/tokenizer.rsreverse_vocab + id_to_token()
events/types.rs7 brain-related events
providers/rl_policy/state.rsNeuronContext (6 dims)
space.rsedges, substrats, centroid fields

Bench Infrastructure — Brain Development Toolkit

The bench CLI and dashboard provide the full toolkit for growing, testing, and analyzing brain performance. This is the primary development loop for AIMO competition work.

CLI Commands (bench brain)

CommandPurpose
bench brain report [--run ID]Full run report (accuracy, recognition, execution paths, LLM savings)
bench brain compare --runs A,BSide-by-side comparison of two runs
bench brain curve [--run ID]ASCII learning curve (brain improvement over time)
bench brain explain --run ID --problem PIDSingle-problem decision trace
bench brain substrats [--run ID]Substrat topology breakdown

Benchmark Runner

# Local run with brain telemetry
bench bench --model qwen2.5-math-1.5b --suite math-dataset --limit 500 \
--brain_db brain.db --snapshot_every 50 --dream

# Remote run with dashboard heartbeat
bench bench --model qwen2.5-math-7b --suite aime --limit 100 \
--dashboard_url http://dashboard:3939 --brain_db brain.db --dream

Docker Infrastructure

FilePurpose
bench/DockerfileRust bench-cli build (2-stage, debian-slim)
bench/Dockerfile.dashboardNode.js dashboard + Rust API (3-stage)
bench/docker-compose.bench.ymlRun benchmarks (8GB mem, 4 CPU)
bench/docker-compose.coolify.ymlProduction dashboard (Coolify-ready)
bench/scripts/run-remote.shRemote runner script (auto model download)

Dashboard Pages

  • Brain — 4 stat cards (trees, substrats, method neurons, recognition rate) + learning curve chart + substrat accuracy chart + dream events table
  • Runners — Remote runner monitoring (status, progress, brain topology, system metrics, 5s auto-refresh)

SQLite Brain Store (bench/cli/src/brain_store.rs, 859 lines)

Separate brain.db with 3 tables:

  • brain_events — per-problem decision capture
  • brain_snapshots — periodic topology snapshots
  • dream_events — consolidation stats

Design Decisions

  1. Feature-gated: fractal-brain keeps the brain module optional — no impact on builds that don't need it
  2. Last BERT layer only: Simpler, validated in Python. Neuron struct is layer-aware for future multi-layer
  3. Three store implementations: InMemory (testing), File (desktop/offline), SQLite (server/production)
  4. Graceful degradation: If embed_tokens() is not supported, NeuronRecallStrategy silently delegates to fallback
  5. Content-addressable IDs: NeuronId = SHA-256 of activation vector — deterministic, collision-resistant
  6. Safety bounds on executor: Prevents runaway signal propagation (depth, budget, fanout limits)
  7. Independent space learning: Each SpaceOrganism learns independently, cross-space edges are opt-in
  8. Pipeline fallback: SpaceOrganism falls back to Pipeline for Novel queries during transition period