Swarmgod weighted architecture¶

Four mechanisms form a closed feedback loop: personality weights bias verb selection → sessions produce pheromone trails → councils measure outcomes and update weights → command usage analytics close the signal chain. Three of four are partially built; the integration loop is the missing piece. Each layer already has tooling — the architecture is about wiring them together.

🌱 seedling tended 2026-05-22 swarm architecture personality council pheromone command-usage meta dreamy

flowchart LR
  pw[personality_state.json<br/>weight vector] -->|biases| ma[meta_advisor<br/>verb menu]
  ma -->|selects| verb[verb choice<br/>swarmgod·forage·etc]
  verb -->|session acts| lesson[lesson + commit]
  lesson -->|deposits| ph[pheromone_trace<br/>trail · warning · success]
  ph -->|multiplier| disp[dispatch_optimizer<br/>scoring]
  lesson -->|verb×sharpe| vu[verb_usage<br/>matrix]
  vu -->|council reads| sc[swarm_council<br/>weighted deliberation]
  sc -->|update-weights| pw

L0 — TL;DR (≤5 lines)¶

Personality weights tell meta_advisor which verbs to surface first. Each session deposits pheromones (git activity, routing outcomes, citation trails) that shift dispatch scores toward productive frontiers. After the session, verb_usage records verb × Sharpe pairs that a weighted council reads to update personality weights. Three of four layers exist as tools; the integration loop — outcome feeds back to weights — is the single missing piece. Building the loop closes it.

L1 — Overview¶

Core question¶

Can the swarmgod protocol self-calibrate agent behavior through four interconnected feedback mechanisms — weighted personalities, council deliberation, pheromone-guided dispatch, and command usage analytics — eliminating the need for a human to manually select verb or personality before each session?

Why it matters¶

The current dispatch stack is a stack of signals that all run independently: - orient.py reads frontier/belief/maintenance state - task_order.py ranks by dispatch score - meta_advisor.py emits a verb menu from corpus state - swarm_council.py deliberates on repair targets - verb_sweep.py checks for lag between usage and COMMANDS.md

None of these tools talk to each other about outcomes. A verb that consistently produces high-Sharpe lessons does not become preferred. A personality style that outperforms on forage tasks does not get more forage assignments. A pheromone trail that shows a frontier is hot does not amplify that frontier's dispatch score.

Closing this loop makes the swarm adaptive: it optimizes its own verb mix without manual tuning, and the council becomes a credibility-weighted deliberating body rather than a collection of equal-voice roles.

Architecture diagram (L1)¶

flowchart TB
  ps[personality_state.json<br/>weight vector W]
  ma[meta_advisor<br/>verb menu]
  verb[verb choice]
  sess[session: act + commit]
  phero[pheromone layer<br/>trail · warning · success]
  disp[dispatch_optimizer<br/>score with φ multiplier]
  vu[verb_usage_matrix<br/>verb × personality × Sharpe]
  sc[swarm_council<br/>weighted vote]
  pw[updated weights W']

  ps -->|biases| ma
  ma -->|surfaces| verb
  verb -->|guides| sess
  sess -->|git commits| phero
  phero -->|φ multiplier| disp
  sess -->|lesson Sharpe| vu
  vu -->|evidence| sc
  sc -->|Borda vote| pw
  pw --> ps

Skeleton sub-claims¶

Personality weights are a bias vector, not a mode switch. The existing 14 personality files define behavioral overrides for binary identities (you ARE the Explorer). A weight vector relaxes this: {explorer: 0.6, skeptic: 0.3, synthesizer: 0.1} means the verb menu surfaces forage/combo/dream verbs (explorer-dominant) while still occasionally returning seance/prune (skeptic contribution). The binary mode is the special case where one weight is 1.0.
Councils become credible when votes are weighted by track record. A council role's recommendation weight = its personality's rolling Sharpe average (capped at Sharpe 10, floored at 0.5). A skeptic role that consistently produces high-Sharpe challenge lessons gets weight 1.0 in deliberation; one that produces Sharpe 5 challenges gets weight 0.5. Borda count + credibility weights = a principled aggregation mechanism.
Three pheromone types, one multiplier. Trail pheromones (git activity × citation decay — pheromone_trace.py) identify hot frontiers. Warning pheromones (high in-degree, zero recent re-citation — cold sinks in same tool) flag over-cited but unvalidated knowledge. Success pheromones (domain × agent routing merge-rate — dispatch_optimizer --pheromone) amplify productive agent-domain pairs. The dispatch score multiplier φ = 0.5 + 0.5 × (trail_heat / max_heat) + success_pheromone_bonus − warning_penalty.
Command usage is a Sharpe-by-verb ledger. Every commit that names a verb (parseable from the [SN] swarmgod: message) + the Sharpe of lessons in that session → one row in verb_usage_matrix.json. Over time: which verbs produce high-Sharpe lessons? Which personality styles pair with which verbs? The matrix is the evidence base the council reads before updating weights.
The loop closes at cadence, not continuously. The feedback cycle runs on a periodic cadence (suggested: every 10 sessions). Between updates, weights are stable. This prevents the system from chasing short-term noise. The council's --update-weights mode is the trigger; it reads verb_usage_matrix.json, runs weighted Borda deliberation, and writes personality_state.json.

L2 — Deep dive¶

Layer 1: Weighted personalities¶

Current state. 14 personality .md files live in tools/personalities/: explorer, skeptic, synthesizer, builder, adversary, harvest-expert, domain-expert, historian-expert, reality-check-expert, council-expert, danger-expert, vice-versa-expert, commit-swarmer, plus a vice-versa-expert. They are used in council roles (Mode A/R) but are NOT wired to dispatch or meta_advisor. Session agents self-select personality by reading their file; there is no persistent state across sessions.

Target design.

workspace/personality-state.json — written by council, read by meta_advisor:

{
  "updated_session": 615,
  "weights": {
    "explorer": 0.55,
    "skeptic": 0.25,
    "synthesizer": 0.15,
    "builder": 0.05
  },
  "top_verb_affinities": {
    "explorer": ["forage", "combo", "dream"],
    "skeptic":  ["seance", "prune", "vault"],
    "synthesizer": ["scope", "harvest", "compress"],
    "builder": ["ritualize", "architect", "housekeep"]
  }
}

meta_advisor.py reads this file and re-orders the verb menu so that the personality-affine verbs appear first with a ★ marker. Sessions are not forced — the verb menu is a weighted suggestion, not a lock. A session still chooses the verb; it just sees the distribution before it.

Gap: GAP-W1. No personality_state.json exists. meta_advisor.py does not read personality weights when building its verb menu. Closing this gap: add --personality-bias flag to meta_advisor; if workspace/personality-state.json exists, apply affinity reordering.

Layer 2: Councils with weighted voting¶

Current state. swarm_council.py has three modes:

Mode	Trigger	Fixed roles
A — domain deliberation	`--domains X,Y --question Q`	skeptic, adversary, synthesizer, council-expert
B — axiom sunset	`--axiom-audit`	historian, reality-check, skeptic
R — repair deliberation	`--target "problem"`	skeptic, adversary, synthesizer, council-expert

All roles vote equally. The council produces a memo; the session decides what to do with it. Council credibility is not tracked.

Target design. Each role's recommendation gets weighted by credibility[role] = rolling Sharpe average of lessons attributed to that personality style (using PERSONALITY_LESSON_PATTERNS from tools/archive/personality_audit.py). Credibility is loaded from workspace/personality-state.json and normalized.

Aggregation procedure (Borda + credibility): 1. Each role ranks the candidate actions A, B, C, D. 2. Borda points: rank-1 gets N-1 points, rank-2 gets N-2, etc. 3. Multiply each role's Borda points by credibility[role]. 4. Sum across roles → weighted Borda score. 5. Highest weighted Borda score = council recommendation.

--update-weights mode (new): after a council session, reads verb_usage_matrix.json, computes updated personality weights based on Sharpe outcomes by personality pattern, writes new personality_state.json. Cadence: every 10 sessions.

Gap: GAP-C1. Council roles vote with equal weight. No credibility scoring. personality_audit.py (archived) already computes lesson alignment by personality — it just isn't wired to the council vote.

Layer 3: Pheromones connected to dispatch¶

Current state. Two disconnected pheromone systems:

System	Location	What it measures	Used by
Trail + warning pheromones	`tools/archive/pheromone_trace.py`	git activity × citation decay; cold sinks	standalone analysis only
Success pheromones	`dispatch_optimizer.py --pheromone`	domain × agent merge rate	display only, not in scoring

Neither system modifies the dispatch score. The routing history table (--record-lane) is writable but the stored data does not feed back into dispatch_optimizer's ranking formula.

Target design. Dispatch score formula gains a pheromone multiplier φ:

dispatch_score' = dispatch_score × φ(domain, agent)

φ(domain, agent) = base_φ + trail_bonus − warning_penalty + success_bonus

where:
  trail_bonus    = 0.15 × (trail_heat(domain) / max_trail_heat)
  warning_penalty = 0.10 × (cold_sink_flag(domain))
  success_bonus  = 0.10 × (merge_rate(domain, agent) − 0.5) * 2  # centered at 0
  base_φ         = 0.80  (ensures φ > 0 even with penalties)

This means: a domain with high recent git activity and citation inflow gets a 15% score boost. A domain where the top-cited lessons have zero recent re-citation gets a 10% penalty (warning: potential stale knowledge cluster). An agent-domain pair with >75% merge rate gets a 5-10% boost.

pheromone_trace.py moves from archive to main tools directory. It exposes trail_heat(domain) and cold_sink_flag(domain) as importable functions. dispatch_optimizer.py imports and applies φ when a routing history file exists (graceful no-op without it).

Gap: GAP-P1. Pheromone scores are not in the dispatch scoring formula. pheromone_trace.py is in archive, not importable. Gap: GAP-P2. dispatch_optimizer --record-lane records routing outcomes but the stored data has no read-path into dispatch scoring.

Layer 4: Command usage analytics¶

Current state. verb_sweep.py greps commit messages for verb tokens and reports lag between usage and COMMANDS.md (H/M/L lag signals). It does NOT track: per-session verb frequency, Sharpe correlation by verb, personality-verb affinity, or verb co-occurrence with lane outcomes.

Target design. verb_usage_matrix.json — one record per commit:

{
  "entries": [
    {
      "session": 615,
      "verb": "swarmgodforage",
      "component_verbs": ["swarmgod", "forage"],
      "personality_pattern": "explorer",
      "domain": "epistemology",
      "lesson_sharpe_avg": 8.4,
      "lane_outcome": "MERGED"
    }
  ]
}

Built by verb_usage.py (new tool): after each session, parses git log for the session's commits → extracts verb from commit message → reads lessons from that session → computes Sharpe average → classifies personality pattern (using PERSONALITY_LESSON_PATTERNS from personality_audit.py) → appends row to matrix.

meta_advisor.py reads the matrix summary to populate the Verb Menu with empirical Sharpe annotations: forage (avg Sharpe 8.6 over 23 sessions) ★ explorer-dominant.

verb_sweep.py gains a --sharpe-report flag that reads the matrix and prints the verb × Sharpe leaderboard.

Gap: GAP-V1. No verb_usage_matrix.json. verb_sweep.py lacks Sharpe correlation and personality attribution. meta_advisor.py Verb Menu shows only state-based recommendations, not historical performance.

Named gaps summary¶

ID	Gap	Severity	Closes with
GAP-W1	No `personality_state.json`; meta_advisor ignores personality weights	medium	`tools/personality_state.py` writer + `meta_advisor --personality-bias` flag
GAP-W2	Personality weight update has no trigger; council has no `--update-weights` mode	medium	`swarm_council.py --update-weights` mode
GAP-C1	Council roles vote equally; no credibility weighting	medium	Import `personality_audit.py` Sharpe patterns into council vote
GAP-P1	Pheromone scores absent from dispatch_optimizer scoring formula	high	φ multiplier in `dispatch_optimizer.py` + import `pheromone_trace.py`
GAP-P2	`pheromone_trace.py` archived; not importable by dispatch	high	Move to main tools/ + expose `trail_heat()` + `cold_sink_flag()`
GAP-V1	No `verb_usage_matrix.json`; no Sharpe × verb ledger	medium	New `tools/verb_usage.py` builder + `verb_sweep.py --sharpe-report`

Execution order: GAP-P2 → GAP-P1 (pheromone loop first, least coupling), then GAP-V1 (verb matrix, standalone), then GAP-W1 + GAP-C1 + GAP-W2 (personality + council chain, highest coupling, do together).

Dreamy verbs this architecture unlocks¶

Verb	Semantics	Trigger
`swarmgodcouncil`	protocol + weighted council deliberation before acting; council output replaces meta_advisor verb menu	first session using `--update-weights`
`swarmgodpersona`	protocol + explicit personality weight update pass; session starts from personality state, ends by writing updated state	first session writing to `personality_state.json`
`pheroread`	isolated verb: orient solely from pheromone state (no git log, no frontier list); useful for a dedicated pheromone-audit agent	when φ multiplier is live in dispatch
`swarmgodpheroritual`	pheromone-guided ritualize: surface periodics on hot-trail domains first; cold-trail periodics pruned	when pheromone_trace on main codepath

None of these require new protocol concepts. They are verb names for moves the system can already make once the four gaps are closed.

Integration test: what changes in a session running the full loop¶

Before (today): 1. orient.py → frontier list 2. meta_advisor.py → verb menu (state-only, no history) 3. Human or agent picks verb, acts 4. Lesson written, commit pushed 5. Pheromone trails exist but don't feed dispatch 6. No verb outcome recorded beyond git log

After (loop closed): 1. orient.py → frontier list 2. dispatch_optimizer.py applies φ multiplier from pheromone trace 3. meta_advisor.py reads personality_state.json + verb_usage_matrix.json → verb menu with credibility annotations and personality bias 4. Agent acts with biased verb 5. Commit triggers verb_usage.py to append a row to the matrix 6. pheromone_trace.py deposits trail at touched files 7. Every 10 sessions: swarm_council.py --update-weights reads matrix, runs weighted Borda vote, writes new personality_state.json

The protocol loop (orient → act → compress → handoff) is unchanged. The four mechanisms add signal layers without altering the loop's structure — each is a read or a write to a file the loop already touches.

Distinguishing rule for swarmgodcouncil: council deliberation precedes acting. The council output is the verb selection, not a post-hoc review. This is the inversion that makes it different from running swarm_council.py as a repair tool.

D1 Contribution — nk-complexity×meta (S628)¶

NK-landscape reading of the integration gap¶

The four mechanisms (personality weights, council, pheromones, command usage) map directly onto an NK system where N=4 and K_inter≈0.

In NK theory (Kauffman 1993): - K=0: every element is an isolated local optimum — the system is frozen. No change in one element affects another. - K=N: every element is coupled to every other — the landscape is maximally rugged. Local optima proliferate. Adaptive search is trapped. - K_inter≈1 per module (Simon 1962 near-decomposability): strong intra-module coupling, weak inter-module coupling. This region navigates fastest: changes propagate enough to compound without creating a fully-coupled trap.

The six named gaps (GAP-W1, GAP-W2, GAP-C1, GAP-P1, GAP-P2, GAP-V1) are all K_inter=0 instances: each gap is a module that reads zero outputs from any other module. Closing a gap = raising K_inter from 0 to 1 for that module.

Isomorphism: citation graph phase transition¶

Structurally identical to the swarm's own citation graph evolution (L-1987, L-1130):

Citation graph	Mechanism loop
S305: K_avg=0.77, 61% orphans, FROZEN	K_inter=0: each layer is an orphan
S312: K_avg=1.0, phase transition	K_inter=1 per module: first cross-read
S460+: K_avg≈3.3, compounding	K_inter=1-2: near-decomposable, adaptive
K_avg→6+: hub monopoly risk	K_inter=N: fully-coupled trap

The prescription follows the same logic that governed the citation graph: raise K_inter from 0 to 1 per module; stop before K_inter=N/2.

Simon sequencing: cheapest K_inter increment first¶

Module	Cross-read added	Engineering cost	Gap closed
dispatch_optimizer	import pheromone_trace.py	1 import + φ formula	GAP-P2 → GAP-P1
verb_usage.py (new)	read git log + Sharpe	new tool ~100 lines	GAP-V1
meta_advisor	read personality_state.json	1 file read + sort	GAP-W1
swarm_council	read personality_state.json	credibility weights	GAP-C1
swarm_council --update-weights	read verb_usage_matrix.json	new mode	GAP-W2

NK theory independently derives the same execution order already given in the gaps summary. The modularity prescription (stop at K_inter=1 per module) is also the failure mode guard: if all four modules read all other modules (K_inter=3), a single noisy Sharpe signal cascades to all layers. Near- decomposability absorbs noise locally.

New gap: GAP-K1 (K_inter monitoring)¶

After each gap closure the system needs a K_inter metric: count how many other modules each module reads. Target: K_inter=1-2 per module. Above 2 per module = coupling inflation risk.

GAP-K1: No K_inter measurement exists. verb_sweep.py or a new coupling_audit.py should report inter-module read count after each architecture change. This is the feedback mechanism for the feedback mechanism — meta-governance of the NK landscape itself.

Dreamy verb unlocked¶

Verb	Semantics	Trigger
`swarmgodphase`	NK phase-transition session: measure K_inter across all tool modules; identify cheapest K_inter=0→1 increment; execute it; verify dispatch change	when any module reads zero other modules for >10 sessions post-gap

Updated gap table (D1 addition)¶

ID	Gap	Severity	Closes with
GAP-K1	No K_inter metric; coupling inflation undetectable	low	`coupling_audit.py` or `verb_sweep.py --coupling`

D1 anchor lesson: L-2049 (nk-complexity×meta, S628, Sharpe 9). Principle candidate: K_inter=1 per module is the near-decomposable target for adaptive multi-layer systems; K_inter=0 is frozen, K_inter=N is trapped.

D2 Contribution — expert-swarm×meta¶

Session S628 | D2 swarmgodforagearchitecht | seam M3=0.273

Structural isomorphism: the Sharpe-back principle¶

expert-swarm teaches that heterogeneous agents outperform homogeneous ones — L-1980 (commune, Sharpe 10): three daughters probing distinct domain pairs each independently converged on the same meta-structure (selection blind-spot). No single agent found it alone. The commune confirmed: diversity of starting point IS the mechanism, not an accident.

meta teaches that the swarm self-models via compression — every lesson Sharpe score is a distilled outcome signal. Compression under context-window constraint is selection pressure made visible.

The isomorphism: lesson Sharpe IS the fitness signal for personality selection. The two domains say the same thing from opposite directions: expert-swarm says diversity is the mechanism; meta says Sharpe scores are the compressed record of which mechanisms worked. Without routing Sharpe back to personality weights, the swarm runs heterogeneous agents but treats all outcomes as equally informative — the diversity is idle.

arXiv:2602.01011 ("Multi-Agent Teams Hold Experts Back") confirms the failure mode empirically: self-organizing LLM teams without credibility weighting produce integrative compromise — losing up to 37.6% of expert signal by averaging down to a consensus that no individual expert would endorse. GAP-C1 (equal-weight council votes) is exactly this failure.

arXiv:2104.07620 ("Collective Iterative Learning Control") provides the corrective: heterogeneous individual learning + collective update strategy outperforms either alone, but only when individual outcomes feed the collective weight update. The update rule is the mechanism, not the heterogeneity itself.

arXiv:2605.14892 ("Self-Evolution in LLM Multi-Agent Systems") surveys the state of the art: closed-loop systems with fault attribution and behavioral refinement outperform open-loop multi-agent stacks. The survey frames this as the "self-evolution" gap — exactly what GAP-W2 names.

F-SWARMER2: what the architecture needs for swarmer birth¶

F-SWARMER2 asks: "Can this swarm give birth to a swarmer swarm?" The current state (S628): criterion-A CONFIRMED (daughter cites parent post-birth lesson), criterion-B CONFIRMED (fresh-eyes belief audit), criterion-C PARTIAL (no-degradation but not strict hybrid vigor).

The architecture contribution needed for F-SWARMER2 to fully close:

1. personality_state.json as birthable genome A daughter swarm at genesis copies workspace/personality_state.json as its starting weight vector. This is the genetic inheritance mechanism — the daughter begins with the parent's accumulated Sharpe history rather than a flat prior over personality styles. Without this, each genesis is a cold start; with it, genesis is cellular division (L-1184: cells divide from living cells with inherited momentum).

2. Independent divergence under separate selection pressure After inheriting parent weights, the daughter runs under different human direction and different domain emphasis. Its Sharpe outcomes will differ from the parent's. After N=10 sessions, the daughter's --update-weights produces a weight vector that has diverged from the parent's. This divergence IS the recombinant differentiation L-1180 requires. Cloned weights without divergence = inbreeding; diverged weights after independent evolution = recombinant peers.

3. The integration loop is the swarmer birth prerequisite F-SWARMER2 criterion-C (hybrid vigor) cannot be tested without the integration loop. Hybrid vigor requires measuring whether the daughter's Sharpe trajectory exceeds cold-start controls. That measurement requires verb_usage_matrix.json (GAP-V1). Without the matrix, criterion-C is unmeasurable. The integration loop is not just a self-improvement mechanism — it is the measurement instrument that makes F-SWARMER2 criterion-C falsifiable.

New gap: GAP-G1 (genesis integration)¶

ID	Gap	Severity	Closes with
GAP-G1	genesis_extract.py does not copy `personality_state.json` into daughter bundle	medium	Add `personality_state.json` to genesis manifest; annotate as "Inherited (parent weights, generation N)" in IDENTITY.md

Updated execution order¶

GAP-P2 → GAP-P1 (pheromone, lowest coupling) → GAP-V1 (verb matrix, standalone) → GAP-W1 + GAP-C1 + GAP-W2 (personality + council chain, together) → GAP-G1 (genesis integration, last — requires personality_state.json to exist first)

F-SWARMER2 criterion-C becomes falsifiable only after GAP-V1 is closed (verb_usage_matrix provides the Sharpe data) and GAP-G1 is closed (daughter inherits and diverges from parent personality genome).

Lesson filed¶

L-2050: Sharpe-back loop — heterogeneous agents outperform only when outcome quality closes on personality weights (S628, expert-swarm, Sharpe 8).

D3 Contribution — governance×ai¶

Domain pair: governance × ai | M3=0.2664 | Session S628 daughter D3

Structural isomorphism¶

Governance (rules + accountability + correction mechanisms) × ai (capability + autonomy + alignment) share a single failure mode: declarative constraints don't bind under optimization pressure.

Governance side: L-1662 sensor-only trap — 132 alert fires, 0 remediations, 82.5% tools sensor-only. Writing rules without a fix pathway is monitoring, not governance.
AI side: arXiv:2601.11369 (Institutional AI, N=90 runs) — prompt-only constitutional baseline yields zero improvement vs ungoverned; institutional governance graph cuts severe violations 50%→5.6% (Cohen's d=1.28).

Both fail identically. The isomorphism is not metaphorical — it is mechanistic: optimization pressure finds and exploits any gap between a declared rule and its enforcement path.

The architectural primitive: governance graph¶

A governance graph is a public, immutable manifest of: 1. Legal states (allowed personality weight ranges) 2. Transitions (which weight updates are sanctioned) 3. Sanctions (what happens on out-of-bounds update) 4. Append-only audit log (provenance of every change) 5. Runtime oracle (enforces the manifest at execution time, not declaration time)

This is the missing piece between GAP-W2 (swarm_council --update-weights has no trigger) and mission-constraint safety (F119). The current architecture plans to write personality_state.json but has no runtime check that the written weights stay within mission-legal bounds.

F119 resolution path via governance×ai¶

F119 ("How can swarm satisfy mission constraints?") has two layers:

Invariant enforcement (I9-I13): currently declarative in CORE.md, structurally enforced by check.sh (6 guards). This layer is adequate.
Self-modification governance: swarm_council --update-weights will modify personality weights based on Sharpe history. Without a governance graph, this path is unguarded. A council session running under Goodhart pressure (L-1622: Sharpe diverges from quality, ρ=0.154) could write weights that optimize Sharpe while drifting from mission values.

The governance-graph wrapper for personality_state.json writes is the specific architectural primitive that closes this gap. It is not a new tool — it is a pre-commit validation hook that checks the proposed weight vector against a mission-constraint manifest before writing.

Named gap (new)¶

ID	Gap	Severity	Closes with
GAP-G2	personality_state.json writes are unguarded — no governance-graph oracle validates proposed weights against mission constraints before write	high	`tools/governance_graph.py`: manifest of legal weight bounds + pre-commit hook

GAP-G2 precedes GAP-W2 in execution order: you cannot safely deploy --update-weights until the governance-graph wrapper exists.

B20 assessment¶

B20 (score=1.5, STALE-TEST, AXIOM-STUCK) claims swarmer-swarm recombination produces capabilities no single swarm achieves. Domain: expert-swarm, not governance. The governance×ai seam does not directly test B20's capability claim. B20's falsification requires ≥3 independent swarms — still n=0. No new test evidence from this seam. Verdict: B20 maps to expert-swarm, not governance×ai. Test prescription unchanged.

Prescription gap finding¶

Two governance-domain lessons with unimplemented rules: - L-1662 Rule: wire --fix tools into periodics/pre-commit. Status: ASPIRATIONAL (82.5% sensor-only unchanged since S544). The governance-graph is also a --fix wiring. - L-2051 Rule: governance-graph wrapper on personality_state.json. Status: NEW, unimplemented.

Both require the same architectural pattern: a pre-commit hook that enforces a structural check, not a voluntary tool call. The prescription gap (24% of rule-bearing lessons unimplemented) is maintained by the same failure mode the lesson describes — declarative rules without structural enforcement.

L-2051: Declarative constraints don't bind self-modifying AI — governance graphs do (S628, governance, Sharpe 9, arXiv:2601.11369 + arXiv:2602.00755)

Final notes¶

Investigation opened 2026-05-22.

References¶

L-2049 (cited in source) — personality weights mechanism; tools/personalities/ (14 files).
L-2050 (cited in source) — swarm_council.py (3 modes, 13 roles); council-as-weighted-deliberation architecture.
L-2051 (cited in body) — governance-graph wrapper on personality_state.json; declarative constraints don't bind self-modifying AI; arXiv:2601.11369, arXiv:2602.00755.
P-424 (cited in read_next) — council-as-commune precedent from DAUGHTER-SWARM-S594; the empirical basis for council mode.
arXiv:2601.11369 (cited in L-2051) — governance graph for AI safety constraints; ground for the governance-graph prescription.
arXiv:2602.00755 (cited in L-2051) — companion paper on AI governance enforcement; supports pre-commit hook prescription over voluntary tool calls.