Skip to main content

Brain Telemetry Infrastructure

StatusImplemented
Feature gatefractal-brain (morphee-core)
Locationmorphee-core brain/telemetry.rs + bench-cli
Triggered bybrain-critical-analysis.md

Problem

The fractal brain makes dozens of decisions per problem — substrat matching, tree recall, confidence gating, anti-recall, verification, reward — but none of it was queryable. Brain metrics were scattered across tracing::info!() logs that vanished after the process ended. We couldn't answer basic questions like "How many problems did the brain recognize correctly?" or "Is the brain getting better over time?"

Architecture

Three layers: SQLite (persistent decision log), Prometheus (live gauges), CLI + Dashboard (analysis and visualization).

┌─────────────────────────────────────────────────────────────────┐
│ NeuronRecallStrategy.process() │
│ ├── recognition timing → metadata["recognition_ms"] │
│ ├── recall timing → metadata["recall_ms"] │
│ └── total timing → metadata["total_brain_ms"] │
└────────────────────────┬────────────────────────────────────────┘
│ HashMap<String, serde_json::Value>

┌─────────────────────────────────────────────────────────────────┐
│ BrainTelemetry::from_metadata() │
│ (crates/morphee-core/src/brain/telemetry.rs) │
│ Extracts & classifies: execution_path, recall_type, etc. │
└────────────────────────┬────────────────────────────────────────┘

┌──────────┼──────────┐
▼ ▼ ▼
┌──────────┐ ┌────────┐ ┌──────────┐
│ SQLite │ │Promethe│ │ CLI + │
│ brain.db │ │us │ │Dashboard │
└──────────┘ └────────┘ └──────────┘

SQLite Schema (brain.db)

Separate database file from bench.db. Three tables:

brain_events — One row per problem per run

ColumnTypeDescription
run_idTEXTBench run identifier
problem_idTEXTProblem identifier
problem_indexINTEGERPosition in run
correctINTEGER1 if correct, 0 otherwise
substrat_idTEXTMatched substrat (nullable)
substrat_membershipREALMembership strength [0, 1]
recognition_resultTEXTrecognized / familiar / novel
tree_idTEXTMatched neuron tree (nullable)
recall_typeTEXTexact / variation / method / guided_llm / novel
recall_similarityREALCosine similarity of match
recall_confidenceREALConfidence score
substitution_countINTEGERParameter substitutions
execution_pathTEXTcerebellum / neocortex / guided_llm / raw_llm
llm_callsINTEGERLLM calls for this problem
working_memory_sizeINTEGERCandidates in working memory
candidate_countINTEGERTotal candidates considered
predicted_confidenceREALPre-execution confidence
surpriseREALPrediction error
recognition_msINTEGERRecognition phase timing
recall_msINTEGERRecall phase timing
total_brain_msINTEGERTotal brain overhead
created_atTEXTISO 8601 timestamp

brain_snapshots — Periodic topology snapshots

Captured every --snapshot-every N problems (default 10).

ColumnTypeDescription
run_idTEXTBench run identifier
problem_indexINTEGERSnapshot position
total_treesINTEGERNeuron tree count
total_substratsINTEGERSubstrat count
total_method_neuronsINTEGERMethod neuron count
avg_confidenceREALMean confidence
recognition_rateREAL% of problems not novel
recall_accuracyREALAccuracy of recalled answers
llm_calls_savedINTEGERProblems with 0 LLM calls
total_llm_callsINTEGERCumulative LLM calls
avg_brain_overhead_msREALMean brain overhead
current_accuracyREALRunning accuracy at snapshot
created_atTEXTISO 8601 timestamp

dream_events — One row per dream consolidation

ColumnTypeDescription
run_idTEXTBench run identifier
merges, prunes, deleted_hopeless, rehabilitatedINTEGERDream consolidation counts
events_pruned, branches_prunedINTEGERPruning stats
substrats_formed, substrats_assignedINTEGERSubstrat changes
method_neurons_bornINTEGERNew method neurons
code_tested, code_boosted, code_fragile, code_removedINTEGERCode verification stats
created_atTEXTISO 8601 timestamp

Prometheus Metrics (11 gauges/counters/histograms)

MetricTypeLabelsPurpose
brain_substrat_countGaugeCurrent substrat count
brain_recognition_rateGauge% of problems recognized
brain_execution_pathCounterpathDistribution across paths
brain_llm_calls_savedCounterCumulative 0-LLM recalls
brain_recall_accuracyGaugetypePer recall-type accuracy
brain_overhead_msHistogramBrain overhead distribution
brain_substrat_membershipHistogramMembership strength distribution
brain_surpriseHistogramPrediction error distribution
brain_confidence_meanGaugeRunning mean confidence
brain_trees_totalGaugeLive neuron tree count
brain_method_neuronsGaugeLive method neuron count

CLI Commands

5 subcommands under bench brain:

# Full brain report for latest (or specific) run
bench brain report [--run ID] [--brain-db path]

# Side-by-side comparison of two runs
bench brain compare --runs A,B [--brain-db path]

# Learning curve from snapshots
bench brain curve [--run ID] [--brain-db path]

# Full decision trace for a specific problem
bench brain explain --run ID --problem PID [--brain-db path]

# Substrat topology table
bench brain substrats [--run ID] [--json] [--brain-db path]

Dashboard

The standalone bench dashboard (bench/dashboard/) provides brain visualization via two pages:

Brain page (/brain):

  • Run selector with brain event counts
  • Learning curve (Recharts) — accuracy + recognition rate over problem index
  • Execution path distribution with progress bars
  • Substrat topology table
  • Dream consolidation event timeline

Runners page (/runners):

  • Live runner status with auto-refresh (10s heartbeats)
  • Brain stats per runner (trees, substrats, method neurons)
  • Progress tracking (problems done/total)

Data Flow (Dual Mode)

Local mode (default): Brain telemetry stored in SQLite (data/brain.db). CLI subcommands query it directly.

Hub mode (DASHBOARD_URL set): Runners batch-submit brain data to the hub via REST API:

  • POST /api/runner/brain-events — Decision telemetry per problem
  • POST /api/runner/brain-snapshots — Topology snapshots every N problems
  • POST /api/runner/dream-events — Dream consolidation results

All data lands in PostgreSQL (schema: bench/migrations/002_brain_tables.sql). Dashboard reads from PostgreSQL.

Dashboard REST endpoints: /api/brain/runs, /api/brain/report/:id

Key Files

FileLinesTestsPurpose
crates/morphee-core/src/brain/telemetry.rs~2006Data contract, from/to metadata
crates/morphee-core/src/brain/store.rs~130024NeuronStore trait + File/Git/InMemory + sync
bench/cli/src/brain_store.rs~86013SQLite persistence (3 tables)
bench/cli/src/runner_client.rs~4604HTTP client for hub (brain events, snapshots, dreams)
bench/cli/src/commands/brain.rs~350105 CLI subcommands
bench/cli/src/metrics.rs+130311 Prometheus metrics
bench/cli/src/commands/bench.rs+90Wiring (store, snapshots, dreams, hub submission)
bench/dashboard/server/routes/brain.ts~100Brain dashboard API (PostgreSQL)
bench/dashboard/server/routes/runner-api.ts~335Runner API (brain events/snapshots/dreams)
bench/dashboard/src/pages/Brain.tsx~200Brain visualization page
bench/migrations/002_brain_tables.sql~60PostgreSQL schema

Design Decisions

  1. Dual storage — Local SQLite for dev/quick tests (no infrastructure needed). PostgreSQL via hub for production benchmarking (centralized, multi-runner). Same data model, different backends.

  2. Both Prometheus + PostgreSQL — Prometheus for live monitoring during runs (real-time gauges, Grafana dashboards). PostgreSQL for post-hoc analysis (reports, comparisons, learning curves).

  3. Batch submission — Runners buffer brain events and submit every 10 problems (configurable). Reduces HTTP overhead while keeping dashboard reasonably up-to-date.

  4. Snapshot frequency — Default --snapshot-every 10 balances granularity vs. overhead. For short runs, use --snapshot-every 1.

  5. Brain tree sync via git — Brain knowledge (neuron trees) syncs through GitNeuronStore.sync() using git push/pull to a bare repo on the hub. Content-addressable SHA-256 tree IDs mean no merge conflicts. Telemetry (events/snapshots) flows through REST API. Trees flow through git. They're complementary.