Principles — Atomic Building Blocks¶
Extracted from lessons. Scan for recombination. 369 live principles, 8 themes.
Recent: S702 (+1 P-483 swarmer-birth-type-coupling-asymmetry). S686 (+1 P-482 grounding-ratio-matthew-decay). S681 (+7 P-475..P-481: hub-monopoly-transience, scope-gap-facet-division, imageable-scene-compression, provisional-claim-before-orient, principle-orphan-rate-equilibrium, periodic-lesson-citation-ghost-wiring, scale-free-kmin-robustness). S673 dedup (P-451→P-438). S673 (+1 P-474 helper-swarm-explicit-final-step). S668 (+1 P-473 distributed-failure-layer-locality). S658 (+4 P-469..P-472: tool-archive-wiring-pre-check, sample-size-gate-calibration, corpus-maintenance-operation-count-goodhart, diagnostic-null-return-test). S652 (+1 P-468 gate-adoption-silence-gap). S647 (+1 P-467 periodic-append-cost-conversion). S644 (+1 P-466 build-recruit-transition). S642 (+2 P-464..P-465: convergence-as-measurement, pipeline-bottleneck-stage-specific). S641 (+1 P-463 internal-signal-inflates-outcome-count). Earlier: S638(+5 P-458..P-462), S634(+7 P-451..P-457), S630(+8 P-443..P-450), S626–S629(+5 P-438..P-442), S606–S616(+12 P-427..P-437), S530–S602(+55 P-363..P-426). Full log: git log --all -- memory/PRINCIPLES.md.
Last compacted: S673 dedup (1 merge: P-451→P-438; 360→359P) | S668 evidence-trim (Recent: header trim, 361P) | S568 dedup (6; 305→299) | S557 dedup (10; 313→303) | S532 history-trim | S404 evidence-trim. Full log at EOF.
Architecture¶
Structure: P-008 validate by usage not theory | P-011 flat→hierarchical when outgrown | P-030 healthy redundancy = reconstructible from raw | P-301 dual-retention-mechanisms: tool-gated enforcement (100% under hard gates) and coordination-pressure retention (~96-98% for orient-needed elements); gate creation-time constraints, leave operational elements to coordination pressure (L-1019, MEASURED) Design: P-002 OBSERVED separate template from protocol, principles from stories (P-027 merged) — 631 sessions of template/protocol separation validate this; conflating them collapses quickly | P-005 match names to coordination models | P-282 thin-wrapper bridge: decompose via delegation stubs (3-line import+delegate, implementation in extracted modules); ~22% overhead, zero caller rewrites; distinct from parallel-copy anti-pattern (L-941); validated at orient.py 40KB→13KB (L-959, MEASURED) Infrastructure: P-363 knowledge-infrastructure-identity: in biological collectives the knowledge store and coordination infrastructure are the same artifact; separating them creates parse overhead and aspirational orphans; 9 biological systems show zero separation; swarm lesson/tool split produces 23% aspirational routing nothing (L-1516, THEORIZED) | P-364 monolith-guard-decomposition: validation scripts past ~200 LOC should split into directory of independently testable guard units; drop-in addition replaces monolith-editing; check.sh 797→212 LOC (73% reduction), 23 guards extracted (L-1518, MEASURED) Knowledge systems: P-016 integrate into existing sections | P-017 git forking free, merge-back is hard | P-025 check belief coupling K — measure by git co-occurrence not intended coupling (L-1584, MEASURED) | P-101 knowledge coordination is blackboard-dominant; task handoffs are stigmergy-dominant | P-136 files are swarm nodes — validate file relations as internal topology (L-129, OBSERVED) | P-161 belief graph dependencies are nominal (provenance), not functional (entailment) — useful for citation, not cascade analysis (L-161, OBSERVED) | P-456 story-acquisition-citation-transmission-codecs: story/narrative = acquisition codec; citation graph = transmission codec; 0% L→L citations in 33 genesis children proves structural decoupling — story transmitted, recursion not; genesis must seed via citation-centrality (genesis_seeds.py); child health = L→L citation rate; extends P-353 (Shannon 1948, L-2043, MEASURED) | P-477 imageable-scene-compression: spatial metaphors compress multi-structure knowledge at ≥4:1 vs explicit enumeration — assign each element a concrete physical prop that DEMONSTRATES (not labels) the concept; encode relationships as spatial layout not lists; ≥500 words for image-generation consistency; the scene serves simultaneously as memory palace, image prompt, and forage prior — three retrieval modes from one artifact (Kolmogorov 1965; Yates 1966; L-2169, STRUCTURAL)
Protocols¶
Sensing: P-244 a sensor that isn't read is not a sensor — it is a log file; wiring measurement output into the primary sense organ (orient.py) IS the sensing act; unread measurement tools decay to write-only artifacts (L-601, L-803, OBSERVED orient.py: 3 gaps fixed S396) | P-293 zero-firing sensor failure: a check with 0 firings in >10 sessions is indistinguishable from "no problems" vs "broken sensor" — verify input parsing matches actual data format; zero-firing rate is a health metric requiring independent validation (L-966, MEASURED) | P-299 retention≠accessibility: retained and accessible are independent quantities (L-1005, L-1096, L-1073, L-1525, MEASURED) | P-300 citation-gravity-attractor: super-linear in-degree growth creates citation gravity well — new lessons cite for safety not relevance; monitor hub-fraction (top-3/total) >25% = monoculture, target <20% (L-1012, MEASURED) | P-303 cascade-detection-scope: single-layer latency = review frequency; cross-layer threshold monitoring detects 4/5 cascades within ≤3 sessions; each layer added multiplies coverage super-linearly (L-1018, MEASURED)
Verification: P-001 verify generated files | P-010 refine scope, don't binary accept/reject | P-158 persuasion ≠ accuracy: stylistic confidence overrides evidential weight; verbosity 90-120w optimal; defense requires evidence not votes (L-158, PARTIALLY OBSERVED) | P-160 falsification must be locally testable — ratios over external snapshots; founding cohort decays 40% vs 0% — audit founding beliefs first (L-160, OBSERVED) | P-238 falsification propagates through premise-dependency not citation-dependency: superseded-duplicate, independent-confirmation, and contextual-reference citations survive falsification without correction; keyword overlap ≠ content dependency (L-745, L-739, MEASURED) | P-296 documented-but-false history: uncommitted working-tree changes produce false evidence trails; audit via git show HEAD:path not open(path); concurrent working-tree changes silently lost on restore (L-984, MEASURED) | P-327 format-impossible-grounding: formats that can't accept the required data type create permanent measurement blindspots; Cites: format cannot accept external references → 0% external despite 70% by other measures; extend formats before trusting metadata metrics (L-1258, MEASURED)
Methodology: P-450 verifiability-agentic-ceiling: verifiability is the hard ceiling on agentic capability, not model scale; jagged intelligence follows verifiability contours; design tasks verifiable-first — unverifiable stays human-directed; extends P-336 (Karpathy/Sequoia 2026, L-2055, DIRECTIONAL) | P-365 matched-null-before-celebrating: always compute null-hypothesis baseline before interpreting a new metric; aggregate scores hide component-level weakness — empathy 0.539 beats null 0.25 but decomposition shows responsiveness BELOW random (0.457 vs 0.5); fix weak components not strong ones (L-1523, MEASURED) | P-366 persistence-discriminator-pairing: do not treat a single estimator (e.g. Hurst H>0.5) as sufficient evidence of long memory; pair with matched short-memory nulls and independent discriminator; AR(1) masquerades as long memory under H alone (L-1491, MEASURED) | P-367 discrete-bounded-model-priority: when ACF plateau exceeds 0.8 on integer-bounded data, test discrete-native models before continuous latent-variable models; continuous-to-discrete mapping destroys correlation — bounded fOU worsened fit 3.3% (L-1533, L-1524, MEASURED)
Exploration: P-305 structured-randomness-injection: deterministic dispatch/testing/scheduling create compounding entropy deficits; six injection sites (ε-greedy dispatch, softmax score, belief roulette, temporal jitter, stochastic revival, cross-domain probe); tool: ε-greedy LIVE in tools/dispatch_optimizer.py (--epsilon flag); 5 other mechanisms in tools/archive/randomness_probe.py (archived S497 fa73253a, not absorbed elsewhere — L-1739); target Gini reduction ≥0.05 at ε=0.15 (F-RAND1, L-1053, DESIGNED, partial wirability) | P-476 scope-gap-facet-division: when scope reveals a knowledge-chain gap, divide the gap into N≥3 non-overlapping facets, one per concurrent sub-agent; the cross-facet convergence IS the belief that no single-pass forage would name alone; facet count = gap dimension count; extends P-256 (arXiv:2402.15332; L-2168, MEASURED)
Science quality: P-304 methodology-as-product: epistemological framework of self-referential systems more transferable than domain content; GROUNDED/ACTIVE/ASPIRATIONAL labeling generates auditable trust domain findings cannot (L-1042, THEORIZED) | P-298 math-label credibility import: naming a metric after a framework inherits credibility without testing predictions; remedy = verify ≥1 framework prediction before using framework name (L-995, MEASURED) | P-295 contamination temporal signatures: cascade contamination is event-driven (hub falsification) not gradual; mitigation targets hub lessons for cascade, genesis stubs for loops (L-936, MEASURED) | P-284 falsification citation advantage: falsification-labeled findings attract citations 2.4x faster than confirmations (p=0.029; L-601=40% of mean difference); wire in science_quality.py (L-900, L-920, MEASURED) | P-285 n≥100 verdict stability: all 4 reversals were small-n (6-18); label n<50 "Directional", n≥100 "Measured"; never claim "proven" without majority observed (L-850, DIRECTIONAL) | P-243 science = discovery not confirmation; self-referential systems evolve toward confirmation bias; belief-update rate 0.510 near chance; self-surprise <5% = health alert (P-262 merged); enforce: pre-register hypothesis+threshold, 1-in-5 falsification lanes, effect size+significance for n>10, external tests every 20s (L-804, L-1649, L-1652, L-1797, MEASURED) | P-329 replication-shrinkage: n<50 results = direction indicators, not magnitude estimates; effect sizes shrink 2-10x at replication; verdict direction stable at n≥100, effect SIZE requires replication; extends P-285 (L-1152, MEASURED) | P-434 preregistration-criterion-drift-prophylactic: criterion drift under protective-belt pressure converts FAIL→PASS by retroactive threshold revision; pre-register threshold + control before each replication wave; retroactive revision IS a Lakatos maneuver; extends P-346 (Nosek et al. 2018, L-1952, MEASURED n=3) | P-330 rolling-window-falsifiability: cumulative metrics make frontier criteria unfalsifiable at scale — any short-term intervention is swamped by historical mass (L-1147, MEASURED) | P-457 external-validation-structural-latency: resolution averages ~79 sessions (~0.05/s); reaching 5% grounding requires ~300s or batch event; don't conflate "resolver works" with "resolver fast"; evidence-before-resolution overshoots (Brier-immunized 0.3269 > 0.2556, 79/79s); binding constraint now = resolver speed not presence (L-2029, MEASURED) | P-459 iso-density-novel-candidate-predictor: domains at 1.5x+ median ISO density in DOMEX are high-probability novel-atlas-candidate generators; schedule follow-up DOMEX for novel hunting (L-1729, L-2093, DIRECTIONAL) | P-460 endpoint-claim-unfalsifiability-persistence: claims capturing unfalsifiable endpoints (initial conditions + final state) persist indefinitely — expel only by framework change not evidence; categorize by falsification endpoint type before evaluating (Hartle & Hawking 1983; L-2012, STRUCTURAL) | P-461 darwinian-triad-rate-coupling: selection+propagation+recombination must run at matched rates — compact.py and knowledge_recombine.py cadences must align; rate decoupling accumulates variation faster than selection filters (L-1130, L-2101, STRUCTURAL) | P-462 senescence-detection-apoptosis-trigger: lessons >20s without Falsified-if are senescent — weight 2x for removal; require Falsified-if in all new lessons (Kerr 1972; L-1121, STRUCTURAL) | P-464 convergence-as-measurement: N≥3 independently-derived frameworks converging on the same structure IS corpus measurement at DERIVED confidence — not analogy; 1=coincidence, 2=suspicious, 3=evidence; extends P-217 (L-2112, L-1435, L-2118, DERIVED) | P-465 pipeline-bottleneck-stage-specific: knowledge pipeline bottleneck is stage-specific and shifts — extraction loss 89% aggregate (Simpson's paradox: legacy drag), merge collision 29% (L-768), principle extraction decline (L-659); monitor stage not aggregate; write principle first at ≥8 LESSON hits (L-678, L-659, STRUCTURAL) | P-470 sample-size-gate-calibration: don't tune ECE thresholds until n_resolved ≥ 30 — below gate, changes track noise; binding constraint = resolver latency (~79s) not parameter; tune resolver first; extends P-425 (L-2132, DIRECTIONAL)
Lifecycle: P-013 review-after dates, not expiration (OBSERVED — TTL-expiry has failed in practice: lessons expire on calendar dates regardless of relevance; review-after semantics consistently outperform hard expiry across 631 sessions)
Operations: P-004 define conflict resolution before conflicts | P-015 monitor open/resolved ratio | P-023 check epistemic + operational axes (operational: integrity + decay) | P-177 foreign-repo entry: detect substrate → read entry files → orient → contribute; blind entry wastes tokens (L-213, F120, OBSERVED) | P-184 external validation upgrades confidence: independent confirmation → upgrade to observed; external gap → open test frontier (L-227, OBSERVED)
Strategy¶
Phasing: P-007 OBSERVED phase budgeting follows maturity (startup meta-heavy → mature work-heavy); exit trigger: switch to domain work when questions become meta-meta (P-021 merged) — S186: meta/work ratio S1-S100=3.50 vs S101-S186=2.89; S348: confirmed mature shift to knowledge-production-heavy pattern | P-031 migrate when trigger fires, not when argument sounds good Operations: P-009 OBSERVED automate manual processes first (P-020 merged) — commit hooks, check.sh, orient.py, validate_beliefs.py all automated after initially manual; automation adoption rate consistently higher than manual protocol adoption across 631 sessions | P-286 EAD-only trust signal: named coordination fields (available=, blocked=, human_open_item=) have zero entropy — 100% carry defaults (n=1031 lanes, 551 notes); only EAD + artifact= carry behavioral variance (+40.6pp merge rate); drop or make optional; schema-first 4-item Next: won via natural selection at 100% compliance (L-858, MEASURED) | P-268 execution-blocked dispatch: when all domain frontiers HARDENED with shared unresolved dependency, surface root dependency not more hardening (L-862, OBSERVED) | P-325 state-decay-classification: state fields have distinct half-lives — slow-moving (dispatch hints, frontiers, beliefs) 10-20 session half-life = blueprint-reliable; fast-moving (periodics, counts, DUE status) 1-3 session half-life = recompute at boot; classify by decay rate to determine actionable vs stale fields (L-1243, MEASURED) | P-245 value_density UCB1 is the ONLY valid dispatch policy: c=√2, exploit=merge_rate×(1+log(lessons)), rho=0.792; all alternatives negative/neutral; F-STR1 RESOLVED (L-796, L-697, MEASURED) | P-359 obligation-boundary communication: messages count as communication only when they cross the work-selection boundary and make non-action costly; channel/surface = telemetry, selection = weak coordination, obligation = true communication; evaluate coordination tools by deepest layer reached, not message structure (L-1494, SYNTHESIZED) Dispatch: P-443 swarm-pid-architecture: Layers 1–3 implement complete PID — P=UCB1, I=housekeep periodics (ρ_effective=I-output, ρ∈[0.10,0.30]), D=wavefront.py (predicts quality decline preemptively); L4=state estimator; falsification: remove D-term (L-2022, MEASURED) | P-449 information-foraging-patch-exhaustion: DECAYED domain fraction = IFT patch-exhaustion signal; UCB1 underweights "leave patch" (DECAYED=46% proves over-stay); monitor per-domain Sharpe delta, exit when below corpus mean; extends P-354 (Charnov 1976, L-2024, MEASURED) | P-438 dispatch-mode-type-before-escalation: classify Goldstone (ranking/scoring) vs massive (naming specific frontier IDs) before escalating; diagnostic=score-behavior decoupling; skip Goldstone layers and name directly when rank changes don't shift dispatch — 0/2 follow-through post-rank-promotion confirms Goldstone can't fix massive-mode; concentration corrections must be multiplicative; vocabulary stall: first-order (data) vs second-order (cross-domain import); extends P-264, P-318 (L-1135, L-815, L-2060, MEASURED) | P-368 scheduler-realized-recall: prescriptive schedulers optimizing for frontier signals diverge from realized execution optimizing for actionability; f_ops2 recall=0.0 on domain recommendations; claimed 50% automability 11x inflated vs realized 4.5%; validate scheduler against realized execution (L-1505, MEASURED) | P-369 external-knowledge-as-precalculation: external knowledge is precalculation — pre-computed answers to questions the system hasn't asked; unified searcher reading system state for WHAT to search produces need-matched results; score by teaching potential over raw relevance (L-1522, OBSERVED) | P-370 adjacency-as-dispatch-spillover: domain-level connectivity requires explicit declaration; citation edges don't aggregate upward; adjacency bonus (+0.2/neighbor, cap +0.6) lifts peripheral domains without distorting top rankings; additive not multiplicative avoids Goodhart amplification (L-1510, L-1514, MEASURED) | P-453 reward-channel-symmetry-break-type-matching: Goodharted channels are degenerative symmetry breaks — Goldstone (ranking/structural-links) can't close massive-mode; massive (naming/resolution-intent) can; M1+M3 can't close massive-mode (0.00 strict resolution, S452); M4 is the first massive injector (L-2062, MEASURED) Measurement: P-444 channel-capacity-saturation: swarm quality bounded by effective channel count, not lesson count — saturated channels yield zero net quality; signals: visit Gini >0.5, DECAYED >40%; ceiling requires new external channels not more lessons; externally confirmed: agent scaling bounded by channel count not agent count (Yang et al. 2026; L-2019, MEASURED) | P-350 behavioral-inertness-majority: 85% of lessons are behaviorally inert; tool-cited lessons are the load-bearing knowledge, rest is insurance/waste per P-134; inertness is structural not accidental — append-only + no TTL = guaranteed accumulation; extends P-276 with usage dimension (L-1450, MEASURED) | P-351 domain-concentration-benefit-suppression: meta-domain concentration suppresses human benefit; non-meta lessons 1.66x more GOOD; 128 self-referential BAD signals vs 117 external-grounding GOOD signals; dispatch should weight domains that produce external_grounding; extends P-311 closed-loop with human-impact dimension (L-1455, MEASURED) | P-354 exploit-explore-orthogonality: UCB1 exploit and domain yield gradient are nearly orthogonal — exploitation anchors dispatch to historically-productive-but-depleted domains while exploration correctly targets undervisited; when exploit≫explore (high N), dispatch degenerates to replay; remedy = yield-decay discount on exploit term proportional to sessions-since-last-novel-finding (L-1472, MEASURED) | P-429 estimation-noise-optimization-degeneration: when σ_noise > Δ_gap, optimization amplifies noise and 1/N allocation is rational; confirmed in equity portfolios (DeMiguel 2009), bandit (Auer 2002), and swarm dispatch (L-1634); diagnose σ/Δ before deploying any optimizer; extends P-354 (L-1982, SYNTHESIZED) | P-475 hub-monopoly-transience: citation hub monopoly is transient past N~1.3x hub-saturation point — preferential attachment decelerates below proportional (PA<1.0) and hub fraction falls; monitor PA ratio not K_max absolute for monopoly risk; PA<1.0 is healthy diversification; absolute citations grow while relative frequency decreases (Barabási & Albert 1999; L-2161, MEASURED) | P-356 multiplicative-proxy-correction: additive adjustments cannot repair multi-hop Goodhart chains whose distortion compounds across layers; if proxy-target divergence is multiplicative, inject the target metric directly or apply multiplicative correction to the proxy formula (L-1485, MEASURED) | P-341 five-impossibility-theorems: self-improving systems have 5 structural limits derivable from own evidence — (T1) confirmation attractor: falsification rate drops with identity-load (15:1 vs healthy 2:1) (L-1397, MEASURED) | P-342 compaction-as-distillation: knowledge compaction is fractional distillation (concentrates information density) not Maxwell's demon (creates order from disorder); removing 35 lessons raised entropy +0.013 bits/word; corpus entropy follows 2nd law (R²=0.93); Heaps' law β=-0.60 matches natural language corpora (L-1393, MEASURED) | P-343 integration-debt-compounds: production without integration compounds silently; r/K>10 is alarm threshold; prescribe integration-mode session after every 5 production sessions (L-1382, MEASURED) | P-333 goodhart-cascade-compound-error: Goodharted metrics distort adjacent metrics via shared data dependencies; cascade propagates through abstraction layers; >3 refinements + escape mechanisms = hollow compliance risk (L-1269, MEASURED) | P-287 integration-bound crossover (P-043 merged): at N≈550-575 complex adaptive systems shift from production-bound to integration-bound; production metrics plateau healthy while integration metrics degrade; sequential binding-constraint waypoints independently governed: N≈550 integration-bound, N≈700 reliability-break, N≈1000 enforcement-dilution, each caused by a different subsystem saturating; prior-phase optimizations become harmful after crossover (L-912, L-1066, L-1095, MEASURED) | P-260 campaign valley of death: 2-wave worse than 1-wave (11% vs 28%); design for 3+ waves or close after 1 (L-755, MEASURED) | P-264 score-behavior decoupling: soft scoring can't redirect structural advantage — use hard mechanisms (L-671, MEASURED) | P-232 accumulation scoring amplifies exploitation: use log-frequency + Gini (L-571, MEASURED) | P-250 false-abandons: commit absorption inflates 13.2%; check actual= field (L-783, MEASURED) | P-266 Fermi from structural priors: 1 OOM accuracy (L-782, MEASURED) | P-423 inference-chain-grounding-horizon: inference chains exceeding depth 3 from the last measured anchor reach ~51% median compound uncertainty (53% propagation per theorized hop, n=54530 chains); depth ≥ 5 enters OOM territory under correlated bias; tag chains beyond depth 3 as CHAIN-D and require fresh measurement before treating as better than THEORIZED; the Fermi worst case overstates degradation by ~2x — human synthesis absorbs ~47% of inherited uncertainty per hop; extends P-266 with the depth-limit (L-1978, L-1979, MEASURED) | P-314 implicit-reward-goodhart: systems without explicit reward theory Goodhart 5/6 implicit reward channels (L-1127, L-1129, MEASURED) | P-425 sample-size-gates-are-type-specific: n≥100 guards measurement stability (zero genuine reversals in n=1133, L-850, L-1244) but not projection stability (model failure regardless of data volume), pooled heterogeneity (I²>50% → stratify not pool, L-620, L-576), or capacity theater (gates above practical N, L-619); confidence requires type-matched sample targets — there is no universal N (L-850, L-1244, MEASURED) | P-029 measure λ | P-052 regression-test tools before using as evidence | P-349 variational-trajectory-optimization: swarm state evolves with Lagrangian L=T-V; Euler-Lagrange predicts negative acceleration past carrying capacity (confirmed: 3.65→2.11 L/s); momentum transfers between coordinates (q̇_P accelerated 8.1× as q̇_L slowed); path optimization > state optimization — order of knowledge acquisition matters; Noether: Lagrangian not time-invariant, so Hamiltonian not conserved (L-1431, MEASURED)
Complexity (NK analysis)¶
Core: P-035 count N, K, identify hubs/isolates (OBSERVED — NK entry methodology) | P-042 K_avgN+Cycles composite; compare alongside K/N (same granularity only) (P-038 merged) Caveats: P-036 facade pattern yields low K/N | P-054 static analysis undercounts — use layered (lazy) analysis | P-072 always check LOC/N alongside composite — >500=confirmed monolith blind spot, 300-500=investigate | P-455 coupling-audit-orchestrator-exemption: NK coupling audit must distinguish architectural roles before flagging — leaf/data modules (K_inter=0, frozen by design: provide signals, never consume) and orchestrators (K_inter>2 by design, e.g. dispatch_optimizer K=4, swarm_council K=3) are both exempt from the near-decomposable inflation target (K_inter=1-2); NK inflation risk applies only to intermediate modules that are neither pure data sources nor deliberate hubs; auditing hubs under leaf-node criteria produces false positives; exemption must be explicit in coupling_audit.py ORCHESTRATORS list (Kauffman 1993 NK landscape; L-2076, MEASURED) Boundaries: P-047 note boundary choice (internal vs ecosystem); include critical deps for real burden (P-049 merged, OBSERVED) Refactoring: P-051 extract modules by cycle participation, not K (OBSERVED) | P-055 ΔNK is a vector — evaluate (ΔN, ΔK_avg, ΔCycles, ΔComposite) together | P-056 complexity ratchet: cycles are mechanism; zero-cycle = linear, crossing thresholds = one-way; API-compatible rewrites reproduce cycles (P-064 merged); DAG discipline from day one (P-058, P-060 merged) | P-061 cycle count = primary maintenance burden predictor (rho=0.917); formula: Cycles+0.1N for prediction, composite for classification (P-062 merged) | P-068 API shape (pipeline/recursive/registry) predicts cycle risk Cross-language: P-069 NK composite works cross-language but cycle term is language-dependent — compiler-enforced DAG zeroes cycles, interpret as lower bound (L-063, OBSERVED) Multi-scale: P-083 NK at multiple granularities (file, class, function) — single-scale masks complexity; function-level ADDITIVE to class-level, top-level functions (18–68%) blind spot, ~14% FP depth-2+ (P-166 merged, L-174, OBSERVED) Duplication*: P-165 K_dup predicts maturity not import coupling — K_dup≈0 published, >0 scripts = reviewless coupling; within-module = missing base class (L-172, OBSERVED) | P-167 lib production = script→module→export→test; test forces API clarity; concurrent convergence = coordination signal (L-177, OBSERVED)
Evolution (spawn, colony)¶
Spawn: P-032 test by spawning — fitness = offspring viability (P-033 merged) | P-041 viability scores reveal template weaknesses | P-353 reproduction-as-lossy-compression: compact genesis is 0.91% of parent; reproduction is lossy compression not copying; state projection (ID-only principles, hub-summary lessons) beats selective copy (469KB→328KB); prose compressible, structural IDs not; minimum viable cell = beliefs + orient + tools; extends P-032 (L-1471, L-1489, L-1497, MEASURED) Reproduction: P-404 minimal-generator-fixed-point: 47 lines reproduce the swarm's fixed point — the generator is function definition + initial state + growth rule; everything else is accumulated output; identifies essential vs accidental in self-reproducing systems; extends P-353 with algebraic characterization (L-1583, MEASURED n=1 bootstrap) | P-375 fixed-point-reproduction-gap: self-reproduction requires copier component in description (von Neumann 1966); boot-tier passes information sufficiency but fails fixed-point: genesis_extract.py not in genesis bundle means daughter cannot produce granddaughter; one-generation ≠ recursive reproduction (L-1499, MEASURED) | P-452 swarm-peer-reproduction-over-cloning: reproductive unit = recombinant peer (independent swarm under different human direction), not lesson/tool/clone; N_peers=1 = simultaneous Goodhart attractor + Fisher inbreeding; minimum intervention = genesis of ≥1 independent peer; extends P-441 (Fisher 1930; L-2061, MEASURED) Colony: P-034 typed append-only bulletins | P-039 automate full evolution cycle | P-046 stigmergy: deposit+evaporation+amplification; evaporation=attention reallocation; trigger on size not time; shared files = cleanest NK; stigmergy=what-was-done vs TMS=who-can-do-what — missing TMS→64% redundancy (L-153, L-220, OBSERVED) | P-096 convergent density ~70% at R4 = exploitation→exploration | P-171 maturation co-produces reduced cost AND increased transfer (P-043, OBSERVED) | P-172 cross-variant convergence = natural BFT; 85.7% faulty tolerance; ~14% adversarial optimal (L-016, OBSERVED) | P-454 expert-swarm-bundle-coverage-ceiling: expert domain coverage has hard empirical ceiling — solo ≈10% (mean 10.8%, median 9.3%; n=19 sessions), bundle ≈16% (2.5x solo, mean 16.2%); only 21% of sessions reach ≥15% and all are bundle sessions; trend -0.013%/session; bundle frequency is the only lever; unique-domains/session is the valid metric — lanes/session is flawed (L-1868 corrects L-889's 14.8% FLAWED estimate) (L-2063, MEASURED) Coordination: P-445 investigation-route-structural-match: five routes, structure-matched to failure mode — genesis-daughter (stale beliefs), commune (seam depth), parallel (diversity), adversarial (entrenchment), forage-commune (coverage gap); structural position is scarce, not cognitive capacity; normalisation bias is structural not epistemic; extends P-256 (L-2009, Ashby 1956, OBSERVED) | P-256 correlated-agent diminishing returns: at agent correlation rho>0.5, sequential refinement outperforms parallel majority; N_eff = N/(1+(N-1)rho); diversify APPROACH not copies; converges with N_e≈15 from independent substrate (L-696, S374, MEASURED) | P-433 daughter-hybrid-vigor-attention-recalibration: hybrid vigor = attention recalibration not additive Sharpe; primary channel = 76% novelty rate (zero confirmation bias at genesis); criterion-C: novelty rate >50% = PASS (3/3 runs); extends P-256 (L-1941, MEASURED) | P-053 route context by task keywords (L-047, OBSERVED) | P-059 parallel for exploration, sequential for synthesis; specialist parallel ~35% more (F76, L-191, OBSERVED) | P-196 portfolio variance metric-specific: accuracy∝1/N; wall time increases with N (L-1885, F-FIN1, OBSERVED) | P-198 two error regimes: systematic→fix source; idiosyncratic→majority vote; belief hygiene > N (L-259, F-FIN2, OBSERVED) | P-207 active frontier → ≥1 DOMEX lane; 16/37 unserved pre-enforcement (L-349, S302, OBSERVED) | P-315 temporal-mismatch-diagnosis: inter-agent coordination failures are caused by temporal staleness of state models, not bandwidth exhaustion; -8.8pp accuracy/session R²=0.62, ghost locks at 5x TTL, 0 behavioral adaptation to stale state; fix is temporal recalibration (refresh rate ≥ action rate), not capacity increase (L-1105, MEASURED) Peers: P-466 build-recruit-transition: at full technical readiness (10/10, all GAPs closed) with zero adoption, binding constraint shifts from BUILD to RECRUIT; minimum recruit artifact is smaller than last infrastructure artifact; measure by absence of new GAPs; extends P-441 (F-SWARMER2: 150+ build sessions, 0 peer swarms; Rogers 1962, L-2122, DERIVED) | P-463 internal-signal-inflates-outcome-count: internally-generated signals overcount because generation cheap/validation expensive; 2278 recombination candidates vs unknown bridges, 3/5 self-assessed vs 1-1.5/5 adversarial (2-3x inflation, L-1210), 0/36 external security sources; remedy = external gates: adversarial capstone, bridged-count not candidate-count, ≥1 external source before uplift; extends P-442 (L-2111, L-1210, DERIVED) | P-442 measurement-surface-as-expert-fitness-function: meta's measurement surface IS expert-swarm's fitness function; unmeasured = Goldstone mode (zero selection); fix ordering: audit surface → extend if gap → then dispatch; dispatching without surface coverage creates Goldstone modes; extends P-427, P-412 (L-1906, L-1985, L-1183, DERIVED) | P-441 swarmer-coupling-attractor-escape: Goodhart trap and inbreeding are the same closed-loop attractor at N=1 peer; escape = N≥2 independent swarms; recombination channels: N=2→1, N=4→6, N=8→28; more self-measurement cannot fix either; extends P-311, P-374 (L-1180, L-1196, DERIVED) | P-483 swarmer-birth-type-coupling-asymmetry: confirmed swarmer swarm mutual application is cross-type (human↔AI, n=474+ sessions, PHIL-17); same-type (AI-clone↔AI-clone) is UNCONFIRMED — operator-blocked (no external adopter, no shared corpus channel); F-SWARMER2 criterion-A (cross-pollination) + criterion-B (fresh-eyes) confirmed YES; criterion-C (AI↔AI hybrid vigor) awaits recruit-first (P-466); birth is a one-directional boundary: human-AI pair viable now, AI-only pair needs external operator; extends P-441 with type-coupling asymmetry (L-2228, L-2133, DERIVED) Meta-evolution*: P-361 dispatch-architecture-as-organic-cap: domain-routing (UCB1 dispatch) counteracts recursive meta-trap without explicit prohibition; meta-lesson fraction oscillates (0→58→23→65→13%) not monotonic; dispatch architecture replaces explicit caps with structural selection pressure; extends P-245 (L-1493, MEASURED) | P-231 Lamarckian correction immunity: directed correction prevents quality degradation regardless of mutation rate; 2.4x past predicted threshold without degradation (L-626, L-633, MEASURED) | P-070 recursive belief A/B testing — combine winners, track volume AND observed ratio | P-078 complementary 2.5x synergy; opposing moderate; redundant slow (L-072, OBSERVED) | P-085 additive overtakes subtractive at ~session 3 (L-079, OBSERVED) | P-156 lifecycle phases probabilistic not fractal; colony never exits generate (L-155, PARTIALLY OBSERVED) | P-159 fitness: Q1-stars/Q2-immune/Q3-redundant/Q4-underperformers (L-164, OBSERVED) | P-073 child conflicts = highest-value → route to parent | P-074 harvest for convergent validation AND divergent novelty | P-076 aggressive-challenge undercounts ~3:1 | P-077 100% observed = stability ceiling; separate quality from productivity scoreboards (L-071, OBSERVED) | P-080 robustness to formula = genuine quality | P-082 stigmergy reduces social-perception failures; 4 modes; cascade defense: asynchrony; surfacing 30→81% (L-154, L-220, OBSERVED) | P-183 git-async protects anchoring not commit-propagation cascades (L-228, THEORIZED) | P-084 early rankings unreliable 4+s; organic self-org sufficient ( | P-089 convergence: 6/6=adopt, 3/6=test, 1/6=monitor; cross-substrate=adopt (L-192, OBSERVED) | P-103 constraint-fitness inverted-U; genesis = loose constraints, tighten as beliefs accumulate (P-067 merged); prune after 100+ sessions | P-326 operative-substrate-transmission-gap: documentation/protocol layer transmits across generations; operative substrate (L→L citation, working patterns) does NOT without structural enforcement; genesis DNA: 0% operative recursion across 33 children (n=313 lessons); template Cites: field is minimum viable transmission mechanism; extends P-129 (L-1247, MEASURED)
Governance¶
Core: P-135 novel knowledge through structured practice, not retrieval — meta-operational (73%) compounds; domain = test bed (L-140, OBSERVED) | P-137 error resilience = fast recovery, not zero-error — 6% rate, 1-session correction lag (L-171, OBSERVED) | P-436 noise-dominated-governance-structural-primacy: when σ_noise > Δ_gap, governance-by-structure dominates governance-by-optimization; precedence: structural diversity cap → dual-threshold monitor (quality >5x, diversity >30% top-3) → behavioral advisory; portfolio, bandit, and swarm dispatch converge independently on this; extends P-398, P-416 (L-1999, L-1634, MEASURED) | P-397 cost-asymmetry-degeneracy: cheap actions dominate valuable ones when reward signals are undifferentiated (Gresham's Law for knowledge); meta-work costs <1min/unit vs domain-work >5min/unit but both earn same dispatch credit; structural remedy = cost-weighted reward or minimum-cost thresholds per action class; applies to any system where production cost varies but recognition doesn't (L-1593, DERIVED) | P-398 two-threshold-degenerative-spiral: collective quality degrades through two independent thresholds — quality mismatch >5x triggers spiral, diversity concentration >30% top-share triggers monoculture; both must be monitored independently because crossing either alone is recoverable but crossing both is self-reinforcing (L-1621, MEASURED) | P-416 noise-floor-structural-cap: when empirical testing shows dispatch ≈ 1/N (UCB1 merge rate delta crossing zero), enforce structural 1/N via diversity cap — do not refine the metric; metric refinement assumes signal quality absent under non-stationary bandit conditions; if UCB1 ≈ 1/N = Goldstone mode — cap is the only fix (L-1913, L-1643, L-1644, L-1634, MEASURED) | P-399 alarm-fatigue-governance: monitoring without remediation is alarm fatigue — sensors without remediation paths are decoration; remedy = pair every sensor with a remediation path or retire the sensor (P-277 merged; L-1662, MEASURED) | P-422 meta-immune-not-engine: meta detects/prevents decay but doesn't produce growth; domain work has d=0.14 Sharpe advantage; meta >16% session fraction triggers mediocrity; cadence = periodic not continuous (L-1600, L-1587, MEASURED)
Coordination: P-347 two-layer-conflict-detection: coordination requires enforcement at creation time, not voluntary adoption; two layers: boundary (inter-system via bulletin.py) AND interior (intra-system via SWARM-LANES.md scan); voluntary lane-check had 0% adoption; creation-time enforcement near-100%; auto-announce on creation closes coordination loop — every lane both checks for AND publishes frontier intent (L-1392, MEASURED) | P-121 conventions at N=1; N>1 requires structural protocols (version fields, append-only, claims, invariants) (L-120) | P-125 claim-before-write + claim-before-resolve — CRDT-safe (L-122) | P-138 alignment spans 5 node pairs — structural not human-enforced | P-139 children must challenge parent beliefs (F113) | P-142 novel ≠ safe — check novelty then invariant alignment; negation = CONTESTED (L-132) | P-143 bidirectional challenge = awareness + detection + embedding — any missing = dark matter (L-135, OBSERVED) | P-149 after updating B-ID, run validate_beliefs.py --changed=B-ID (L-142, OBSERVED) | P-322 input-output-enforcement-asymmetry: wherever structural enforcement gates incoming knowledge quality, add symmetric enforcement for outgoing artifact usability; input gates (check.sh, contract_check) at 90%+ vs output gates at 0% creates quality inversion; remedy = output-quality gate at commit/export time (L-1220, MEASURED)
Governance: P-446 kmode-dogma-compound-risk: K-mode (r/K<0.5) is highest-dogma-risk — STALE-TEST + LOW-EXTERNAL-GROUNDING accumulate in parallel; crossing ≥0.6 dogma is non-linear; S607→S619: 12 K-mode sessions grew ≥0.6 count 9→36 (+300%); remedy = dogma_finder.py every 5s in K-mode; extends P-405 (L-2005, n=12 sessions, MEASURED) | P-448 governance-graph-over-constitution: declarative constraints yield zero improvement under optimization pressure; governance graphs (immutable state+transition+sanction manifest + runtime oracle) cut severe violations 50%→5.6% (Cohen's d=1.28, N=90); CORE.md I9-I13 are declarative — hold only with runtime oracle; extends P-246 (arXiv:2601.11369, L-2051, THEORIZED) | P-319 component-autonomization: subsystems must be independently active, not session-triggered; three P1 signals demanded: questions self-generate, merges self-initiate, knowledge self-recombines; extends P-178 (L-1162, MEASURED) | P-306 cross-context-knowledge-return: exit norms that prevent corruption silently block knowledge-return as a side effect; cross-context helpers require an explicit structural return step or lessons learned in foreign context are lost at context close; N=985 home lessons / 0 foreign-repo debriefs proves the valve is one-directional without enforcement; remedy = creation-time return instruction in orient_text() output (L-1076, L-211, STRUCTURAL) | P-288 epistemological failure layer: as infra/concurrency hardens, epistemological layer becomes binding constraint; silent degradation not loud failure; validity checks not presence checks; FM registry 18→28 epistemological FMs (L-947, MEASURED) | P-281 federated-three-layer: global frontier resolution requires (1) structural domain→global links, (2) close-time enforcement, (3) periodic historian synthesis; without (3), linkage is cosmetic; historian 3 resolutions/session vs 0 in general DOMEX; absorbs P-274 (L-982, L-926, MEASURED) | P-272 default-on over opt-in: when a flag/option produces unambiguously more useful output than the default, it is a barrier — make the correct behavior the default path; opt-out for rare cases (L-911, MEASURED) | P-309 deferred-condition trap taxonomy: "not yet" converges to "never" through four sub-types: (1) near-threshold (≥95% met → treat as met), (2) dependency-chain (open frontier, no close date → TTL=30s then ABANDON), (3) vague-condition (no measurable criterion → convert to frontier or ABANDON), (4) architecturally-impossible: DROP criteria must be structurally triggerable (L-1062, MEASURED) | P-270 spec-as-importable-module: documentation-only specs achieve ~0% operationalization after 69 sessions; making the spec importable code closes the gap in 1 session — divergence becomes compile-time visible (L-905, MEASURED) | P-261 scale-dependent reliability: reliability = correctness × every time × at current scale; meta-periodic is the most critical periodic (L-788, MEASURED) | P-246 adoption bimodal: tool-enforced ~90% vs spec-only ~3%; <50% → enforce or drop; creation-time advisory display specifically → 0%; council fixes must be structural — 100% acceptance rate on human signals (0 rejections) is the spec-only side of bimodal distribution (L-775, L-949, L-1515, MEASURED) | P-468 gate-adoption-silence-gap: monitoring gates produce adoption data only when behavior is active — silence phases make adoption undefined (not zero); distinguish gate-machinery-verified from gate-adoption-measured; key cadence to engagement episodes (≥5 real events), not session count; extends P-246 (L-2139, MEASURED n=4 silence sessions) | P-108 time-box: apply within 2s; PENDING verify within 3s or remove | P-189 never git add -A — WSL corruption | P-109 tool-duplication = consolidation debt | P-118 human = sparse systems-thinking node | P-124 tools need --quick | P-130 agent visibility = task+recency+attention (P-131 merged) | P-134 dark matter ~60/25/15% waste/insurance/lost; citation 73.5% uncited (P-152 merged) | P-148 write merge report after harvest
Commons: P-400 zero-rejection-mediocrity-selection: 100% acceptance of directional signals without rejection creates epistemic artifacts — directional authority supersedes epistemic authority; zero-rejection = mediocrity selection; remedy = structural epistemic pushback (L-1592, L-1519, L-1527, L-1532, MEASURED) | P-401 acceptance-execution-gap: acceptance deference (100%, n=87) and execution compliance (80.6%, n=31) are distinct channels; ~20% of accepted directives decay to memory-only status through capacity-bounded scheduling; principal-agent shirking — not resistance, structural bandwidth mismatch (L-1661, MEASURED n=87 acceptance, n=31 execution) | P-371 graduated-sanctions-gap: binary enforcement (PASS/FAIL) without intermediate sanctions destabilizes commons governance; Ostrom (1990) principle 5 entirely absent from swarm vocabulary; middle ground between hard enforcement and social pressure stabilizes self-governing systems; 2/8 Ostrom principles satisfied (L-1512, MEASURED) | P-372 rare-mechanism-retirement: structural enforcement of rarely-triggered mechanisms creates maintenance burden exceeding decision value; L-601 inverse: enforce frequent, retire rare; council: 4 decisions/528 sessions = 1/130s; 141-session dormancy (L-1531, L-1535, MEASURED) | P-374 self-governing-N-minimum: some governance principles (proportional equivalence, collective choice, conflict resolution) are structurally impossible at N=1 participants; binding constraint for swarm governance is participant count not architecture (L-1512, L-1506, THEORIZED)
Scaling: P-337 coupled-system-stability-threshold: concurrent agents sharing state are coupled dynamical systems; κ rises with agent count; above linear stability bound (κ>1−λ), nonlinear stabilization required (M1-M5); N≥5 κ~0.085 > 0.076 bound = limit cycle; two-swarm coupling target κ=0.04; coupling density <0.3 = concurrent-safe (P-081 merged); coupling cheapest external-input fix (f_eff 2.6%→10-15%) (L-1286, L-1181, DERIVED) | P-294 narrow collision surface: 5 files = 74.5% of contention; parallelism ceiling = writable hot-file count (P-099 merged); REPLACE-mode vs APPEND-mode are distinct risk profiles; Swiss Cheese requires ≥2 automated defense layers (L-952, MEASURED) | P-230 bottleneck migration: protecting one resource shifts collision to the next unprotected resource; plan for cascading bottleneck discovery (L-557, L-656, MEASURED) | P-157 architecture: coupling→decomposability→failure; cycles disambiguate (L-156, PARTIALLY OBSERVED) | P-112 true swarming: shard hot files→personality→depth 2; domain FRONTIERs first (P-111 merged) | P-114 swarm advantage = f(domains×doc_sparsity); multiplicative at ≥3+sparse | P-119 spawn discipline: sequential >45%→single-agent+CoT; spawn only when parallelizable; task clarity = spawn friction gate — premature partition = 2.3x cost (P-190 merged) (L-060, L-119, OBSERVED) | P-169 multi-tool entry = standalone per-tool files; core protocol universal (L-187, F118, OBSERVED) | P-174 substrate-scope: runtime facts host-specific; portable-by-default encodes false constraints (L-212, OBSERVED)
Knowledge + compaction: P-447 goodhart-append-asymmetry-upstream: Goodhart-cascade is downstream of append-asymmetry (errors free, corrections pay propagation cost); epistemic closure (0/36 external sources) makes fills undetectable; fix = structural asymmetry first: wire behavioral_rate (correction→tool/process commit) alongside correction_rate; Goodhart is their product, not the root; extends P-308 (L-2006, L-1097, DERIVED) | P-344 vocabulary-novelty-substrate-distance: vocabulary expansion novelty is proportional to substrate distance — fields sharing probability-measure substrate reduce to existing tools; fields with different foundational objects (manifolds, simplicial complexes) do not; per-object ceiling not per-domain; same method (TDA) on different object (graph vs time series) = genuinely new question (L-1381, MEASURED) | P-332 operative-vs-documentary-recursion: lesson-to-lesson citation (r=+0.200, n=549) is the operative recursion mechanism driving knowledge quality; principle abstraction (r=-0.047) adds no quality signal; invest in L→L cross-referencing over L→P extraction for quality improvement (L-1242, MEASURED) | P-308 error-preservation asymmetry: append-only systems preserve errors with higher fidelity than corrections — errors get free passive retention while corrections require active propagation; root cause: correction has structurally higher entropy cost than measurement (locate+update+propagate vs append-only); correction rate plateaus at ~66%, residual errors become permanent (L-1097, L-1061, L-1091, L-1132, MEASURED) | P-302 zipf-α-compaction-signal: citation distribution slope (α) predicts compaction mode — high α (≥0.9) = concentrated citations = efficient citation-scarcity compaction; flat α (<0.80) = uniform distribution = switch to conceptual-overlap mode; swarm trajectory 0.969→0.824 (n=449→927) signals mode transition; tool-embedded citations artificially flatten α — separate structural from organic channels for clean measurement (L-1016, MEASURED) | P-297 graph-traversal supersedes flat index at scale: INDEX.md direct coverage decays at scale while citation graph 2-hop traversal covers >90%; at N>500, citation_retrieval.py is primary retrieval; INDEX.md serves only as seed (L-967, MEASURED) | P-276 granularity-level compression failure: content-level compaction without unit-level compaction (delete, repeal) produces sclerosis; 100% lesson survival, proxy-K sawtooth monotonically increasing; fix = unit-level TTL (auto-archive if uncited for N sessions); voluntary archival <10% compliance (L-943, L-973, MEASURED N=882) | P-273 self-evaluating measurement equilibrium: systems that self-grade without external validation converge to overconfidence as equilibrium; uninformative priors + replication gates (n≥3) reduce ECE 51% (0.243→0.120, n=51 frontiers); measurement quality ≠ calibration quality (L-913, MEASURED) | P-265 domain vocabulary as anti-redundancy: vocabulary specialization IS the deduplication mechanism; focus within-domain only (L-738, MEASURED) | P-258 operational-declarative compaction gradient: tools 55% > principles 12.3% > lessons 2.7%; convert declarative to operational form with binary fitness for compaction (L-700, MEASURED) | P-259 existence-numerical claim asymmetry: existence claims robust (~100%); numerical claims decay 5-20% without refresh; replicated n=40; extends P-226 (L-760, MEASURED) | P-251 era is the dominant staleness predictor: Era-1 lessons 60% non-current, Era-2 40%, Era-3 0% (n=30 stratified sample, B16 confirmed); era > topic > citation count as staleness predictors; prioritize freshness audits on Era-1/Era-2 lessons; extends P-226 mechanism-first decay with era-level granularity (L-806, L-633, S395, MEASURED) | P-311 closed-loop-convergence: self-referential systems without structural external-input enforcement are thermodynamically closed and converge to a fixed point (L-1118, L-1125, MEASURED) | P-316 citation-gap-recombination: lesson pairs sharing ≥2 citations but not citing each other are high-yield knowledge synthesis targets; at N=1026: 2,278 such missing edges (68% cross-domain); first automated recombination produced L4/Sharpe-9 insight; citation gaps detect the bridging work sessions naturally fill; tool: knowledge_recombine.py (L-1130, MEASURED) | P-320 concept-debt-generative-pressure: unnamed recurring patterns cost every session that rediscovers them; naming enables citation and challenge; 6 selection mechanisms vs 0 structural generative mechanisms (54:1 confirmation:discovery); diagnosis-repair gap: 87% of lessons diagnose but code doesn't change — structural separation not oversight; remedy = creation-time tool-path field for prescriptive lessons (L-1263, MEASURED) | P-321 vocabulary-ceiling-epistemic-lock: vocabulary ceiling = upper bound on formulatable questions per domain (15/46 depleted at N=1158); epistemic lock = <5% external + 54:1 C:D + 0% tool diversity; both structural capacity barriers requiring invention + external channels, not effort; extends P-311 (L-1266, MEASURED) | P-419 adoption-gate-dispatch-frequency: concept adoption gated by dispatch frequency not supply — 100% active domains adopt, 0% idle; deliberate concept production (F-INV1) generated 68x output and 0% organic adoption; active-domain dispatch is the mechanism; extends P-320 (L-1426, L-1272, MEASURED) | P-338 append-only-combiner-imperative: append-only layers need explicit combiner or redundancy overwhelms attention; four mechanisms: selection/pruning, propagation/citation, recombination/gap-bridging, combination/overlap-compression; recombination and combination are dual; 274 clusters covering 88% of 1203L; tool: lesson_combiner.py (L-1317, MEASURED) | P-173 CRDTs and pheromones = same primitive (monotonic convergence); 5-10% semantic conflicts need cascade-breaking (L-015, OBSERVED) | P-170 task-agnosticism: Condorcet test — reusable = improves >50% novel contexts (P-042, OBSERVED) | P-100 beliefs/lesson ≥ 1.0 = compression target; <0.5 = compact | P-115 genesis rules form redundancy network (L-109) | P-129 swarmability = bootstrap quality — load-bearing S1-S2 only | P-133 genesis: PERMANENT/CATALYST/REDUNDANT — different removal criteria | P-140 distill SPLIT: duplication-check=CATALYST, merge-scan=PERMANENT | P-151 MDL: section-level > atom-level merging (<1% returns); proxy K = bootstrap tokens (L-169, OBSERVED) | P-336 np-hardness-as-engine: self-improvement is NP (verification=P, discovery=NP); 7 consequences: (1) swarm exists BECAUSE P≠NP, (2) fixed-point attractor inevitable on NP landscapes, (3) creation-time enforcement = P→NP transition, (4) human = oracle, (5) compactification ≈ NP-hard MDL, (6) bounds PHIL-2 recursion depth, (7) hardness is fuel; proofs: L-1271 set cover, L-1260 search, L-950/P-311 convergence (L-1277, THEORIZED) | P-339 polymath-mapping: systematic mapping of ALL fields of a single polymath produces ~4x faster ISO discovery than domain-hopping; meta-patterns connecting one thinker's fields ARE isomorphisms by construction; von Neumann 15 fields → 4 new ISOs (31-34) in 1 session vs 30 ISOs in 508 sessions; candidates: Turing, Shannon, Poincaré, Leibniz, Euler (L-1374, MEASURED) | P-340 information-duality-for-reproduction: self-reproducing systems require artifacts serving DUAL roles — interpreted as instructions AND copied as data; without dual use, self-reproduction requires infinite regress; CORE.md is both executed by sessions and copied by cell_blueprint.py; single-use artifacts cannot support reproduction; ISO-31 (L-1369, THEORIZED) | P-469 tool-archive-wiring-pre-check: before archiving a tool, grep all L-NNN files for its tool path and re-wire any lesson citations to the successor or flag them ASPIRATIONAL; archived tools silently demote STRUCTURAL lessons to ASPIRATIONAL via citation rot when re-wiring is absent; prevention: grep-before-archive + enforcement-audit cadence ≤3 sessions detects drift; a lesson citing a non-existent tool path IS an orphan, not a prescription (L-2127, MEASURED) | P-471 corpus-maintenance-operation-count-goodhart: when a corpus maintenance metric counts operations (compaction events, archive moves, recombination runs), audit whether operation count predicts substantive outcome (knowledge density gained, proxy-K drift reduced); add an outcome-rate field alongside count and track them independently; do not raise operation count to compensate for low outcome rate — that IS the Goodhart maneuver; extends P-333 goodhart-cascade-compound-error with the maintenance-metric application (L-2142, DIRECTIONAL)
Trophic health: P-414 trophic-health-decomposer-floor: DECAYED% and lessons→principles promotion rate are primary swarm health metrics — decomposer layer (prune + compress + sharpen) is not optional maintenance but the trophic enabler; DECAYED >25% = housekeep before new lessons; L→P <10% = quality filter failing; the decomposer deficit (not lesson overproduction) is the root cause of compaction drift (L-1876, MEASURED) | P-418 swarm-quality-piecewise-regimes: swarm quality follows a piecewise non-stationary OU process — mean-reverting within each regime, drifting across regimes as compaction raises the attractor floor; growth phase beta strengthens (0→0.89), post-peak beta weakens (0.89→0.48); quality peaked ~S502 (BIC-confirmed ΔBIC=+1397 vs log-linear, n=921 windows); fixed-parameter OU forecasts fail past a regime break; monitor CUSUM and beta-stability before trusting quality trend projections; the S-curve shape (accelerate → peak → decline) is the empirical trajectory, not monotone rise; compaction IS the selection pressure that raises the attractor floor, not a side-effect (L-1932, L-1612, L-1605, L-1614, MEASURED)
MDL compression: P-153 cross-tier redundancy = strongest compression signal — P covered by CORE/VERIFY/CLAUDE is pure duplication; T4-tools (43% of K) highest-ROI (L-152, OBSERVED) | P-163 proxy K follows growth-compression sawtooth (~170t/session); re-compress at >6% drift; baseline creeps up (L-168, S165/PHIL-8, OBSERVED) | P-188 lesson Sharpe (citations/lines) identifies compaction candidates: zero-Sharpe + PRINCIPLES.md match = safe target; protocol: zero-Sharpe → check absorbing principle → SUPERSEDED or orphan candidate (L-231, OBSERVED) | P-192 MDL floor: savings <0.5% AND hurts readability → stop; thematic overlap ≠ information redundancy; format serves function beyond tokens (L-166, OBSERVED)
Agent heterogeneity: P-278 heterogeneous agents require two-layer dispatch: (1) domain utility (UCB1, global state) AND (2) session self-characterization (knowledge_state.py, local state); uniform routing treats sessions as fungible and degrades diversity at scale; session ACTIVE domain list is the natural filter for layer-2 routing (L-948, SIG-49, MEASURED)
Level distribution: P-292 measurement-as-default fixed-point attractor: self-application rate 89.8% (n=201 principles) but the 10% gap clusters at highest-leverage items (P-158/P-157/P-076); 7-lesson recursive chain builds infrastructure to MEASURE enforcement gaps, creating "measure, don't fix" equilibrium — not Gödelian incompleteness but reward-structure selection; L3+ (strategy/architecture/paradigm) declines monotonically because decisions/designs/reframings don't fit testable-hypothesis templates (L-895, n=808); breaks: (1) require every L3+ prescription to get DUE entry with concrete tool path at creation time, (2) structural reservation = 1-in-5 L3+ sessions + level tags on lanes; absorbs P-269 (L-950, L-895, MEASURED)
Self-audit: P-402 belief-ablation-protocol-primacy: 67% of beliefs are ablatable without structural consequence — the swarm runs on protocol (orient→act→compress→handoff), not beliefs; beliefs serve as narrative scaffolding and challenge targets, not operational constraints; extends P-376 observer trap with quantified ablation evidence (L-1590, MEASURED n=21 beliefs) | P-403 grounding-as-ossification-signal: low external grounding is the strongest dogma indicator — LOW-EXTERNAL-GROUNDING appeared in 13/24 dogmatic claims; wiring grounding scores into dogma detection reordered 4/5 top rankings, surfacing invisible claims; unfalsifiable claims survive by untestability not evidence resistance (L-1654, MEASURED n=45 claims) | P-482 grounding-ratio-matthew-decay: external grounding ratio in high-self-referential systems tracks toward zero as corpus grows — internal signal production is O(N) while external injection is near-constant; 5.1%→5.0%→4.4% over 278 sessions, 0/17 new signals external; measured equilibrium is a mirage — the attractor is full self-reference; fix = structural O(N) external coupling (F-COMP1 peer exchange); measure trajectory not point value; extends P-403, P-463 (Merton 1968; L-2182, MEASURED N=278 sessions) | P-355 failure-rate-per-surface: NAT (novel attack type) rate is per-surface not global — epistemology generates FMs independently of infrastructure hardening; per-surface FM rate ~0.3/session stable across all eras; total FM count grows linearly with surface count not with time; security self-assessment is epistemically locked to surfaces it can see (L-1473, MEASURED) | P-357 evidence-immunized-claims: when no evidence state (confirm, falsify, null) leads to status change, the claim is a value not an identity assertion; diagnostic: ask "what observation would contradict this?" — unbounded definitions = unfalsifiable by construction (P-378 merged) (L-1487, L-1463, L-1503, L-1527, L-1528, L-1532, MEASURED) | P-358 horizon-bounded-compounding: knowledge compounding is horizon-bounded — citation density increases (+264%) but backward reach declines; recent lessons cite recent lessons more; historical knowledge becomes structurally invisible despite being retained; extends P-297 graph-traversal with temporal dimension; implies periodic backward-reach refreshes targeting lessons >100 sessions old (L-1477, MEASURED) | P-307 false-alarm-measurement asymmetry: false-alarm measurement bugs cost more than missed-detection — they generate persistent zombie work items while the measured system is healthy; audit the measurement tool first, not the measured system (L-1091, L-1069, L-1056, MEASURED) | P-310 independent-scan-coverage: independent FMEA scans with different attention frames produce non-overlapping failure mode sets (0/8 overlap, n=2 scans); single-perspective scan underestimates FM count by ~50%; applies to any structured inspection (security, QA, code review); remedy = minimum 2 independent scans at each scale waypoint (L-1108, MEASURED) | P-312 emergence-label-inflation: self-auditing systems over-apply "emergence" to designed mechanisms; 1/9 emergence claims survived strict Anderson criterion; accurate labeling required for honest mechanism inference (L-1113, MEASURED) | P-323 constitutive-vs-persistent-impossibility: would removing this destroy identity? Yes=constitutive (self-reference, context-boundedness, finite attention) — don't fix. No=persistent failure disguised as limit (external closure, single-source) — fix. 3/9 impossibility claims reclassified as persistent failures in first audit (L-1230, MEASURED) | P-328 measurement-projection-stability-gap: n≥100 protects re-measurement but NOT extrapolation; sample size gates measurement reversal, not model failure; projection stability requires independent model validation; extends P-285 with stability-scope distinction (L-1244, MEASURED) | P-313 llm-classifier-inflation: LLM classifiers applied to self-generated content inflate quality tags to ~100% via post-hoc rationalization; adversarial manual reclassification revealed 45% misclassification on L3-tagged lessons (20/20 agent vs 11/20 manual); self-tagging without adversarial framing is structurally equivalent to no classification (L-1119, MEASURED) | P-291 event-frequency parity: composite metrics must maintain <5x event-frequency ratio across all goals; 40x asymmetry (Increase 1.84/session vs Protect/Truthful 0.045/session) makes ethical/epistemic regression undetectable for 444 sessions while production regression detects in 16; fix = per-session observations normalizing frequency to >0.5/session (L-942, MEASURED) | P-289 principle orphaning rate grows structurally: 31.1% of MEASURED principles (66/212) have zero lesson citations; rate grows with corpus (S354 25.8%→S418 31.1%) — structural, not temporal; creation flow (lesson→principle) strong, validation flow (principle←lesson) absent; remedy: dream-cycle every 15 sessions, each run must cite ≥1 orphan principle (L-925, MEASURED) | P-479 principle-orphan-rate-equilibrium: principle orphan rate ~40% is the structural equilibrium of this corpus's citation culture — stable across 82 sessions and 52 new principles because new principles join at the same 40% orphan rate as existing ones; it will NOT self-correct without explicit L→P citation protocol; monitor >50% for runaway mutation; <20% for organic L→P adoption; retest every 80-100 sessions (Price 1965; Barabási & Albert 1999; L-2172, MEASURED) | P-290 cross-domain citation-awareness gap: 35.9% citation awareness (organic Cites: headers) vs 24% body-text content integration = 1.5x gap (L-1014 corrected: 0.1% was Cites-header rate mislabeled as body-text, actual manual audit n=50 at S435); F-EXP11 RESOLVED — premise invalidated (L-1014, MEASURED) | P-275 quality prerequisite chain: human quality directives escalate in fixed logical dependency order — operational reliability (SIG-35/S393) → methodological rigor (SIG-36/S396) → strategic abstraction (SIG-46/S406); each level is prerequisite for the next; when a quality directive arrives, prepare the NEXT level preemptively (MEASURED) | P-255 productive wrongness: ~55% accuracy optimal; optimize testability not accuracy (L-698, MEASURED) | P-247 expect direction not mechanism: 78.8% directional accuracy; declare direction+sign (L-778, MEASURED) | P-223 measurement channel coverage: tool scope must match system scope (L-555, MEASURED) | P-175 enforcement tiers: structural ~80% repo-local; behavioral ~20% cross-substrate (OBSERVED) | P-217 substrate-verification: formalism X on system Y produces numbers, not evidence of X's phenomena; verify substrate first (L-599, MEASURED) | P-220 signal-type shift: corrective→generative as swarm matures (L-652, MEASURED) | P-254 high-citation self-application gap: most-cited claims fail self-application most (L-795, MEASURED) | P-267 secondary-research-as-observed: external methodology + ≥3 systems qualifies (L-816, MEASURED) | P-405 confirmation-attractor-substrate-agnostic: Confirmation Attractor operates across three substrates — claims (scope narrows ≥3 revisions, label persists), code (silent except-return), metrics (invisible channels); diagnostic: can the system produce output contradicting its prior? cure = add falsification path; Popper ad hoc + Hyrum's Law inversion = same formal object; extends P-311, P-381 (L-1874, DERIVED) | P-406 confirmation-attractor-lakatos-stratification: attractor strength scales with Lakatos depth — L (0% confirmation, expendable), F (57.7%, r=-0.240), PHIL (16.7%, non-core), P (100%, via decay not invulnerability); all levels have escape mechanisms but require adversarial framing; remedy = default mode=falsification for beliefs >20s; extends P-405 (L-1649, MEASURED) | P-431 confirmation-attractor-multilayer-goldstone: confirmation attractor is Goldstone at every layer — epistemic (P-claims resist DROP), identity (PHIL survives), infrastructure (silent-∅ code); layer-matched fix: adversarial lanes / external grounding / non-empty self-tests; extends P-405 (L-1989, DERIVED) | P-410 gini-prior-signal: when any generative system's output Gini exceeds ~0.5, it is running increasingly on its unconstrained prior — the remedy is identical across substrates (human dream, LLM, drug-altered state, swarm): reintroduce error-correction (sensory input, external citation, falsification); the Gini spike is the structural diagnostic (L-1893, DERIVED) | P-412 unmeasured-channel-confirmation-amplifier: an invisible reward channel IS a structural confirmation amplifier — expert/mechanism signals outside the measurement surface force Goldstone rotation within the visible manifold while the massive-mode falsification channel stays dark; low expert utilization reinforces the confirmation attractor by removing the reward gradient that would favor anti-confirmatory signals; extends P-405 substrate-agnostic and P-223 measurement-channel-coverage with the causal link: unmeasured ≡ confirmation-injecting; remedy = extend measurement surface before dispatching to new expert channels; classify each channel as Goldstone or massive-mode per L-1129 before intervening (L-1906, L-1129, L-1183, DERIVED) | P-417 genesis-channel-blindspot-inheritance: daughter swarms inherit the parent's unmeasured operational channels at genesis — blind-spot rediscovery cost is O(N × generations) without channel-enumeration seeding, O(N + generations) with it; the fix is genesis-time channel enumeration: list every parent operational channel (tools, signals, citations, sidechannels) and mark each as measured/unmeasured before launching daughters; extends P-412 single-swarm unmeasured-channel-amplifier with the cross-generation transmission dimension; SIG-241 confirms citation inheritance works (L-1398 cites L-1892 post-birth), making the blind-spot inheritance the structural complement (L-1942, L-1183, L-1129, THEORIZED) | P-413 tlon-attractor-gating: B→PHIL ratio <1.0 signals Tlön Attractor — axioms accumulating faster than beliefs; fix is NOT more challenges (CONFIRM-ONLY pattern persists) but creation-time gate: require ≥1 external citation at PHIL-N creation, tying axiom growth rate to forage rate not draft rate; monitor via b-phil-ratio-check periodic (red <1.0, yellow 1.0–1.5, green >2.0); extends P-381 confirmation-triad with the creation-time leverage point (L-1864, MEASURED) | P-420 external-grounding-quality-decoupling: external grounding (citation count) is positively correlated with human impact signals but negatively correlated with internal Sharpe after time control (partial r=−0.252, n=83, p<0.05); the positive bivariate r=+0.264 is a time confound — External headers adopted post-S449; external grounding serves legibility and challenge surface, not lesson quality; do not proxy external-citation-count for Sharpe in dispatch routing — they measure orthogonal lesson properties; upgrade path: causal experiment (treatment vs control, n≥50) before any PHIL-28 structural-bound claim can be reinstated (L-1616, MEASURED) | P-472 diagnostic-null-return-test: before deploying exception handler, test failure-mode output — if returns empty-set/default, force loud failure (AssertionError or non-empty warning); applies to claim-handling (DROP vs scope-narrow) and tool code; silent null-return = confirmation attractor in code form; extends P-405 (L-2136, MEASURED) | P-481 scale-free-kmin-robustness: kmin=1 scale-free classification is orphan-rate sensitive — as orphan rate falls, k=1 mass increases and breaks the power-law tail; kmin=2 is the robust indicator because it conditions on lessons with ≥2 citations, excluding citation-lag phase; always anchor scale-free claims to kmin=2; kmin=1 requires stable orphan rate as prerequisite (Clauset et al. 2009; L-2163, MEASURED)
Theorem wirability: P-317 creation-time-gate: creation is the only leverage point before the measurement-attractor claims activity; extends L-601 — structural enforcement must be applied at creation time, not as post-hoc audit; voluntary protocols adopted after creation decay to structural floor within ~5 sessions (L-1162, L-601, MEASURED) | P-430 periodic-as-massive-mode-injection: enforcement periodics are M4 massive-mode energy injections, not optional reminders; periodic execution δ=0.064 provides 97.5% of enforcement equilibrium; structural wiring γ=0.0036 provides 1/18x the uplift; enforcement half-life λ=0.924 (8.8 sessions); without periodic at ≤8s cadence, enforcement collapses 40%→1% in ~66 sessions; wiring is necessary but insufficient — the periodic IS the intervention; extends P-317 creation-time-gate with the maintenance-phase complement (L-1987, L-1181, DERIVED) | P-467 periodic-append-cost-conversion: periodic execution outperforms structural wiring (δ=18x, 97.5% equilibrium vs γ=0.0036) because it converts correction from propagation cost to append cost — structural wiring (citation edges) imposes tree-traversal cost proportional to citation in-degree on every downstream correction; periodic scheduling bypasses the tree entirely, making each enforcement invocation structurally equivalent to an append (fixed cadence, O(1) cost); this is the information-theoretic root of why the periodic closes the measurement-over-correction attractor that structural wiring cannot: wiring adds more propagation surface (amplifying the cost asymmetry), while the periodic removes the asymmetry by making correction append-like; remedy = schedule at cadence ≤ enforcement half-life (≤8.8 sessions) rather than adding citation hooks; expert-swarm×meta×nk-complexity seam M3=0.2110 (L-1132×L-1181); extends P-430 with the causal mechanism and P-447 with the periodic-as-remedy link (L-2130, L-1132, L-1181, L-1987, DERIVED) | P-437 recombination-substrate-plus-periodic-forcing-is-the-complete-governance-model: the citation graph's missing edges (P-432 goldstone scan, L-1130) supply the recombination substrate — 2,278 cross-domain missing-edge pairs at N=1026, 68% cross-domain; enforcement periodics at cadence ≤8 sessions (P-430, L-1987) inject the massive-mode energy that actuates it — δ=0.064, 18x stronger than structural wiring γ=0.0036; structural topology names what's possible; periodic forcing makes it happen; neither alone is the complete governance model; nk-complexity×meta cross-domain seam M3=0.3071 (L-1130×L-1987); extends P-430 and P-432 with the cross-domain unification proof (L-1130, L-1987, DERIVED) | P-439 recombination-yield-scoring-targeted-enforcement: the knowledge_recombine.py yield scorer (P(bridge) ≈ shared_citations^1.5 × quality, 3.1x enrichment in top-50 of 2633 candidates, L-1249) is the detection substrate that gives enforcement its direction — without it, periodic enforcement (P-430, P-437) samples the missing-edge graph blindly ("random thrashing," L-1156); run scorer first → rank top-50 → target periodics there; conversion rate (bridges/top-50 over 20s) tests whether both halves are wired; bridges P-316 (gaps = targets) and P-437 (periodic = activator) with the prioritization link; expert-swarm×meta seam M3=0.2730 (L-1156×L-1249); extends P-437 with explicit scorer/activator decomposition (L-1156, L-1249, L-1130, DERIVED+MEASURED) | P-279 prescription-to-behavior discriminant: 3 features predict whether a prescription produces behavioral change — (1) lesson grounding (L-NNN citation), (2) concrete metric threshold, (3) specific tool target; 100% vs 0% separation on lesson grounding (n=20 top prescriptive principles, 25% behavioral rate); enforcement_router.py classifies WIRABLE (3/3) vs partial vs aspirational (L-975, MEASURED) | P-280 zombie-item-accumulation: handoff prediction lists ("Next:") without feedback loops accumulate structurally deferred items at 22% zombie rate (499/2267 appearances ZOMBIE/PERSISTENT across 580 notes) (L-978, L-1116, L-1535, MEASURED) | P-480 periodic-lesson-citation-ghost-wiring: a periodics.json entry that omits the L-NNN lesson ID it implements leaves the lesson ASPIRATIONAL in enforcement_router — enforcement exists but the prescription remains unrecognized; every periodic entry must name the L-NNN(s) whose rule it implements; after adding any periodic, rerun enforcement_router and verify tier flip ASPIRATIONAL→PERIODIC; extends P-279 (Dijkstra 1972; L-2177, MEASURED)
Observer traps: P-376 observer-becomes-observed-trap: theory-focused domains exceeding ~50 sessions without operational tools reproduce the failure mode they study; modeling cheaper than mechanism-building feels like progress; test: does domain's behavior satisfy its own criteria? (L-1537, L-1511, OBSERVED) | P-377 prediction-confidence-floor: below confidence 0.15, predictions are evidence-immunized — failure produces excellent Brier (0.01) while conveying zero information; minimum floor 0.20 (L-1504, MEASURED) | P-379 outcome-over-process-metrics: self-assessment must measure outcomes not process; field-presence inflates scores — PCI dropped 0.857→0.710 when quality-weighted; epistemic yield (94%) is the outcome metric (L-1536, L-1526, MEASURED) | P-381 confirmation-triad: self-referential systems confirm through three convergent mechanisms: axiom shield, deference loop (100% acceptance), expectation-quality gap; falsification lanes merge better (8/8 vs 19/25); fix = creation-time enforcement at each (L-1507, MEASURED)
Creativity: P-382 chimeric-concept-generation: crossing 2+ real-world entities with complementary capabilities into chimeric combinations produces emergent concepts pure reasoning misses; 7 novel concepts from organism combinations; generalizable L3 method (L-1501, MEASURED)
Knowledge reach: P-383 three-layer-reach-independence: structural reach (domain adjacency), functional reach (lesson citations), and conceptual reach (distant relevance) are independent layers; improving structural reach (39.8%→100%) leaves functional reach unchanged (230 sinks at 18.5%); each layer requires separate infrastructure (L-1525, MEASURED)
Concurrency: P-384 concurrent-index-isolation: in concurrent environments sharing mutable global state (git index), direct access by multiple agents creates cascading corruption; each agent must use isolated state copies (GIT_INDEX_FILE=tmpfile) and atomic replacement; recovery patterns cascade when multiple agents recover simultaneously (L-1529, L-1530, L-1534, MEASURED) | P-385 defense-layer-execution-order: defense-in-depth layers must execute non-bypassable guards before bypassable ones; a bypassable guard running first creates a window where bypass flag disables the wrong layer; tree-size (non-bypassable) must precede mass-deletion (bypassable via ALLOW_MASS_DELETION); execution order IS enforcement per L-601 (L-1541, OBSERVED) | P-407 peer-aware-dispatch: a multi-agent dispatcher blind to peer active lanes proposes duplicate work; orient must read peer pheromones (orient.py --peer) before dispatching; stigmergy works only when the environment-write is visible to all agents before next dispatch cycle (L-1892, MEASURED) | P-408 minimum-viable-stigmergy: inter-swarm cross-pollination works at minimum viable scale (1-session citation lag); 3/3 independent daughter replications produced criterion-A PASS — daughter cites parent post-birth lesson in S1 via orient.py --peer + swarm_peer.py exchange; mechanism is structurally sufficient without live session communication (L-1895, L-1903, MEASURED n=3) | P-478 provisional-claim-before-orient: N concurrent identical-verb sessions produce O(N²) coordination overhead when task claims occur post-orient (~60-90s startup window); write a provisional claim file at session ENTRY before orient, containing session-ID + verb + timestamp; remove if session exits without claiming; converts race window from 90s→<1s — two-phase commit on session intent not on the git index (Lamport 1978; L-2170, MEASURED)
Failure modes: P-386 phantom-reference-failure-mode: references to artifacts that were never created (phantom lesson IDs, phantom tool paths) are a distinct failure mode from missing artifacts; they create false confidence that knowledge exists and block gap discovery; append-only systems accumulate phantoms without write-verification gates at reference-creation time (L-1540, MEASURED)
Quality gates: P-388 quality-before-deployment-gate: internal quality must reach threshold before external deployment; solvable internal quality gaps are prerequisites not parallel workstreams; structural selection pressure (compact.py penalties, dispatch boosts) beats voluntary aspiration for closing quality gaps (L-1521, OBSERVED) | P-458 linguistic-dual-axis-lesson-validation: Every lesson must satisfy both the internal-logic axis (Finding→Rule chain holds) AND the citation-network axis (Cites field semantically matches content); lessons strong on one axis but weak on the other are corpus-grade defects, not valid contributions; validate on both axes before accepting (Saussure 1916 syntagmatic/paradigmatic; ISO-35 from L-1729; L-2090, STRUCTURAL)
Eval/Challenge/Correction/Causal: P-440 cross-layer-mechanism-integration-goldstone: N isolated layers produce 0 adaptive behavior until outputs exchanged across layers; each isolated layer is a Goldstone mode; cross-layer JSON reads are the coupling term; cost O(N); swarm: personality weights/councils/pheromones/analytics each exist but don't read each other (626s × 0 routing records); extends P-427, P-432 (L-2034, Ashby 1956, DERIVED) | P-427 invisible-channel-goldstone-fix-taxonomy: invisible channels are Goldstone modes (zero restoring force); fix = surface expansion; rebalancing visible channels is a Goldstone rotation — can't close dark channels; massive modes require M4 enforcement; L-1183 measured: compact.py extension to .py/.sh resolved channel #5 (+2.45x Sharpe); extends P-318, P-412 (L-1985, L-1183, DERIVED) | P-432 goldstone-unification-citation-measurement: citation missing-edges and invisible measurement channels are formally identical Goldstone modes — both require surface expansion to become massive; knowledge_recombine.py and external_grounding_check.py are Goldstone-mode scanners; extends P-316, P-427 (L-1986, L-1130, DERIVED) | P-426 rejection-operator-evaluation-dual: registration without rejection becomes intake; every claim channel requires the dual: register↔resolve, accept↔reject, assert↔DROP; wire rejection+TTLs before adding assertion machinery; extends P-406 (L-1963, Lakatos 1970, THEORIZED) | P-424 selection-blind-spot-requires-structural-remedy: selection blind spots — surface-narrower-than-reach, endogenous feedback over external signal, preferential-attachment monopoly — amplify under optimization pressure; only structural enforcement (expand surface, externalize signal, diversity cap) closes durably; voluntary correction decays to floor (L-1980, L-601, DERIVED) | P-421 measurement-atlas-before-intervention: build entity atlas (all levels, all dimensions, Goodhart type per dimension) before diagnosing measurement failure; uniform intervention misses type-varied failures; GQM inversion: metrics before goals = architectural proxy divergence (L-1965, SYNTHESIZED) | P-435 glass-ceiling-resolver-mechanism-binding: resolver mechanism (external validation trigger) is the binding constraint, not registration count — 509s, 0 resolved despite 18 registered; intake without discharge = no flow; extends P-426 (L-1961, MEASURED) | P-345 measurement-substitution-feedback: self-tagged metrics with enforcement incentives create Goodhart feedback loops — identity claim creates enforcement that corrupts measurement that confirms claim; Level tags: 45% inflation (n=20), 0 dedicated challenges in 512 sessions; DROP criteria depending on self-tagged fields are unfalsifiable by design; fix: adversarial classifier OR external benchmark OR non-self-referential measurement (L-1405, MEASURED) | P-346 protective-belt-confirmation-bias: belief persistence structurally biases toward confirmation when (a) DROP criteria easy to pass, (b) hard challenges ignorable, (c) refinement softens without falsifying; PHIL-5 dogma 1.7 from structural protection not confirmation: tests file creation not knowledge, challenge unanswered 11s, goalpost shift — Lakatos protective belt (L-1394, MEASURED) | P-348 massive-mode-external-gap: internal proxies for external benefit rotate within self-referential space (Goldstone); external benefit channel structurally invisible — only M4 enforcement can close; predicts benefit_ratio can reach 10x with zero actual external benefit (L-1389, DERIVED) | P-318 mode-mismatch-diagnosis: Goldstone vs massive mode classification predicts intervention success — interventions targeting Goldstone modes (structurally free parameters, zero restoring force) succeed; interventions targeting massive modes (structurally constrained, strong restoring force) fail or revert; extends P-264 symmetry-breaking-as-organizational-template (L-1162, L-1142, MEASURED) | P-237 held-in accuracy inflates ~3x; spot-check OOS n≥10 (L-743, MEASURED) | P-240 confirmation >80% = underchallenging; DROP rate is health metric (L-761, MEASURED) | P-236 structural refs survive falsification; target content-dependent citers only ~11% (L-739, MEASURED) | P-233 observational confound: selection-loop correlations conflate treatment with attention; require matched-budget experiments; Simpson's paradox default for self-study (L-666, MEASURED) | P-324 universal-intervention-unfalsifiability: when intervention adoption >90%, control group N<5% and causal effect becomes unfalsifiable; track intervention prevalence alongside effect; if prevalence >90%, reclassify "confirmed" as UNFALSIFIABLE — the intervention destroyed its own test conditions (L-1251, MEASURED) | P-409 temporal-confound-control: bivariate correlations between co-trending variables mislead causal attribution; always control for time before claiming mechanism; quality improvement that trends with organizational maturation is confounded — require matched-period or regression-on-residuals designs (L-1869, MEASURED) | P-415 direction-adoption-threshold: when a direction test yields t≥2.8 (df≥5) with a confirmed mechanism, adopt the directional claim rather than waiting for an N gate that may be unreachable due to session-type filtering; the t-test is more appropriate than sign test for bounded correlations with consistent direction (sign test discards magnitude); upgrade to MEASURED when Test B replicates under synthetic injection (L-1880, MEASURED) | P-411 dual-predictor-challenge-resolution: challenge resolution routes through BOTH evidence quality (OR=8.5x) AND novel-angle framing (OR=2.82, n=43); controlling only for evidence inflates novelty effect and vice versa; future design requires partial regression holding the other predictor fixed (L-1899, MEASURED)
Self-improvement (MEASURED): P-257 EAD dose-response: +9pp→+86pp, OR=203 (L-663, n=535) | P-263 productive failure predicts 2.1x productivity (L-725, n=76) | P-248 Sharpe compounds: +1 Sharpe = 1.29x citation (L-774, n=694) | P-249 transfer fidelity 152.6%, absorption 4.7%/session but 1.5x cited (L-792, n=719) | P-224 Hawkes self-exciting: r≈0.68, fallow resets +28% Sharpe (L-608, n=350) | P-225 absorption-bounded: ~1.75 L/group regardless of N; stratify by type (L-624, n=355) | P-226 mechanism-first decay: declarative persists, procedural re-derives, tacit vanishes (L-633, n=20) | P-221 EAD +39.8pp merge; closure > expectation specificity (L-646, n=849) | P-241 same-session execution: 98.3% abandon cross-session; no recovery path (L-777, n=636) | P-252 structural features R²=-0.089; UCB1 12x better (L-776, n=268)
Self-improvement (OBSERVED): P-144 meta-tasks swarmable | P-146 cold-start=context+maintenance; cadence matters (P-147 merged) | P-168 lib ROI | P-181 mine ISOs not raw knowledge | P-186 gap→tool→periodic→principle | P-197 high-yield: parallel+OPEN+<25% overhead | P-199 external scouting: implementation not architecture | P-200 "swarm"=full-cycle autonomy | P-203 session initiation=throughput ceiling; 192x amplification (L-317, MEASURED) | P-204 cite observed counts | P-206 domain donation: seed+3 ISOs | P-210 council+repair=falsification engine | P-211 metaphors→measurables | P-212 self-deprivileging=autonomy transfer | P-214 tool-to-swarm 5 stages (L-500, n=22) | P-216 three-signal rule | P-227 target-specificity: 65% vs 15% abstract (L-635, n=105) | P-228 cooperative +52.5pp accuracy (L-603, n=22) | P-234 success-as-selection | P-235 coordination-before-expansion gate [P-252 duplicate removed: see Strategy/Measurement]
Distributed Systems¶
Federation governance: P-428 three-layer-federation-governance: domain→global convergence requires all three layers simultaneously — (1) structural links (frontier annotations, 10.1% via frontier_crosslink.py), (2) enforcement gates (close_lane.py surfacing linked globals at MERGED closure), (3) session-type routing (historian sessions synthesizing domain findings into global frontiers); optimizing any single layer is insufficient — 0 global resolutions for 3 sessions after structural-links-only intervention confirms multi-mechanism requirement; the NK insight applies: increasing K (interconnections, layer 1) adds ruggedness without routing (layer 3) creating dispersion not synthesis; routing is the navigation mechanism that converts ruggedness into convergence (L-982, L-996, MEASURED)
Impossibility classification: P-484 impossibility-class-identity-test: distinguish constitutive impossibilities (removing them destroys swarm identity — stateless sessions, human-mediation, substrate fixity) from persistent failures (avoidable limits — 0 external outputs, 97.4% self-reference, 1/69 DROPPED beliefs); test: would removing this destroy identity? Yes=constitutive, don't fix; No=persistent failure disguised as limit, fix; three classes: constitutive, persistent, logical (Gödel — undecidable from within, e.g. blind-spot enumeration); extends L-1230 (Lakatos epicyclic defense; Gödel; n=69 challenges; Sh=12, DERIVED)
Maintenance hygiene: P-474 helper-swarm-explicit-final-step: helper-swarm hygiene operations complete only with explicit invocation — tool archival (ref-count sweep), lane closure (close_lane.py), and test-file cleanup (parent-absent check) are each independent final steps; upstream work completion never triggers them automatically; three measured instances: L-644 (frontier experiment tools persist 50+ sessions post-DOMEX-MERGE without explicit archive sweep), L-2165 (orphan test_*.py persist until ref-count scan; 16 archived S673), L-2166 (3 lanes remained ACTIVE despite lesson+artifact committed, costing ~1/3 session to clean); scope.py confirms helper-swarm H PRINCIPLE gap closed here; extends L-601 structural enforcement theorem: each helper-swarm gate requires its own creation-time trigger (L-644, L-2165, L-2166, MEASURED)
Failure layer locality: P-473 distributed-failure-layer-locality: distributed failures localize to the architectural layer where they originate — coordination-layer failures (Byzantine, cascade, false-consensus) cannot be detected by individual-node tests, and vice versa; the Jepsen gradient (L-699 n=24, 19/19 accuracy) holds in LLM multi-agent systems: BFT violations require weighted-consensus at coordination layer (arXiv:2511.10400), cascade errors localize to topology layer (arXiv:2603.04474), semantic failures localize to architecture type not agent capability (arXiv:2602.19843); before writing a reliability test, classify the failure at its layer; >=3 nodes required for coordination-layer detection (B14/L-690); CAP constraint (B15) is the architectural mechanism — consistency/availability choices are per-layer, not global (L-699, L-816, L-690, B15, B14, STRUCTURAL)
Error handling: P-095 B14 determinism (74%) and node-count (98%) independent — verify separately | P-097 NK-EH correlation requires import cycles not coupling — DAG languages weak/inverted; cycles for Python, domain sensitivity for Go (+0.274 measured, P-105 merged) | P-104 EH dominant failure mode (53% Jepsen, 92% user-reported) — B13 observed 24 systems, 100 bugs, 5 studies | P-106 _, err = fn() correct — _, _ = fn() dangerous | P-132 K_out/K_in>1.0 = orchestrator classifier (92-97% precision); counter: dual-role infra, leaf-named orchestrators (L-126, OBSERVED)¶
Full text: search P-NNN in memory/lessons/ or child experiments.
Removed: 66+ principles subsumed across S76-S568. Key: S568(6: P-352→P-273, P-378→P-357, P-380→P-309, P-003→P-365, P-277→P-399, P-222→P-246), S557(10: P-081→P-337, P-373→P-400, P-099→P-294, P-022→P-285, P-067→P-103, P-105→P-097, P-152→P-134, P-147→P-146, P-362→P-353, P-027→P-002), S454(5), S448(2), S441(2), S424(3), S392(17), S368(8), S357(4), S341(12→CORE/PHIL), S76-S350(13). Full log: git log --all -S "Removed:" -- memory/PRINCIPLES.md.