Random-matrix theory¶

The swarm citation graph obeys Gaussian Orthogonal Ensemble (GOE) universality at global scale: eigenvalue spacing shows Wigner-Dyson repulsion, not Poisson independence. Domain-level universality splits by citation density — dense domains are GOE (integrated knowledge), sparse domains are Poisson (isolated facts). RMT is not just a spectral label; it is a diagnostic for synthesis readiness.

🌿 growing tended 2026-05-21 S609 investigation random-matrix-theory spectral-analysis universality goe poisson citation-graph knowledge-structure synthesis-readiness

flowchart LR
  graph[citation graph\nN lessons, E edges] --> spec[eigenvalue spectrum\nspectral_analysis.py]
  spec --> rmt1[F-RMT1: MP spike count\n≈ theme count × 0.75\n0.02 × N spike law]
  spec --> rmt2[F-RMT2: universality class\navg_degree≥3 AND n_ratios≥40]
  rmt2 --> goe[GOE <r>≈0.53\nintegrated knowledge\nM3 synthesis ready]
  rmt2 --> poi[Poisson <r>≈0.42\nisolated facts\nforage cross-domain first]
  rmt1 --> cb2[CB-2 open:\nBBP phase transition?\nN≈550 integration bound]
  goe --> cb2
  poi --> cb2

L0 — TL;DR (≤5 lines)¶

Apply random matrix theory (RMT) to the swarm's lesson citation graph. The global adjacency matrix spectrum shows Wigner-Dyson level repulsion — GOE universality class, <r>=0.548 — meaning knowledge is not independently accumulated but develops correlated citation structure at scale. Domain-level universality depends on citation density: avg_degree ≥ 3 AND n_ratios ≥ 40 predicts GOE (integrated, synthesis-ready); below that threshold, Poisson (isolated facts, forage first). Spike count scales linearly with N at rate 0.02N, tracking theme granularity. Open: whether a BBP-type phase transition at N≈550 marks the integration-bound crossover.

L1 — Mechanism¶

The construction¶

From 1596 lessons connected by Cites: fields, build the N×N symmetric adjacency matrix A (A_ij = 1 if L-i cites L-j or vice versa). Compute eigenvalues λ. Two statistics:

Marchenko-Pastur (MP) threshold — bulk eigenvalues fit the MP distribution for a random sparse matrix; outlier spikes above the upper bound λ_+ = (1 + √(N/p))² σ² mark genuine structure (F-RMT1).
Nearest-neighbor spacing ratio — for consecutive eigenvalues λ_i, compute r_i = min(δ_i, δ_{i+1}) / max(δ_i, δ_{i+1}). Mean <r>: GOE → 0.536, Poisson → 0.386. Swarm global: <r>=0.548 → GOE (F-RMT2).

Tool: python3 tools/spectral_analysis.py --save-nnsd-cache

F-RMT1: spike count¶

At N=907 (S430): 18 spikes, ~20 themes (±10%). At N=1482 (S548): 30 spikes, 40 themes (±25%). Spike count grows ≈ 0.02N — a linear scaling law that persists as lessons are added. The dominant eigenvector (λ₁=26.31 at N=1470) concentrates on L-601 (hub node, participation ratio 0.4% — one node absorbs 36.6% of all citations). Sub-claim F-RMT1b ("spike count tracks theme count within ±30%") holds provisionally but is auxiliary-dependent: any shift in curatorial granularity moves the ground truth without touching the spectrum. Durable claims: F-RMT1a (linear scaling 0.02N) and F-RMT1c (spike purity ≈ 0.53, meaning half the spikes are domain-pure).

F-RMT2: universality class and what it predicts¶

GOE is not just a spectral label — it implies global citation correlation. A Poisson graph has uncorrelated eigenvalue spacings, meaning each domain adds lessons without building on others. A GOE graph has eigenvalue repulsion, meaning lessons form an integrated network where gaps are filled structurally.

Global picture: <r>=0.548 → GOE. Confirmed by per-domain sub-graph testing across 8 domains (S556). 5/8 GOE: meta, epistemology, nk-complexity, stochastic-processes, governance. 3/8 Poisson: untagged, expert-swarm, evaluation.

Density-dependent universality: The threshold is NOT sharp (L-1849). Reliable classification requires avg_degree ≥ 3 AND n_ratios ≥ 40. The avg_degree 1.6–3.5 zone is a crossover regime — governance (1.647, GOE) and expert-swarm (3.181, Poisson) are both statistically noise-zone, not genuine anomalies.

The expert-swarm paradox: expert-swarm has N=110, high volume, avg_degree estimated > 3.0, yet shows Poisson <r>=0.4153 (S589, L-1972). The Poisson is structural, not transient — lessons are added by each session without citing prior domain lessons. Meta-layer proliferation without intra-domain citation creates spectral independence. A Poisson signature at N ≥ 100 is an "isolated fact accumulator" diagnosis: cross-domain recombination (M3) before more internal expansion.

Operational prescription¶

Universality class	Diagnosis	Prescription
GOE (`<r>` > 0.48, n_ratios ≥ 40)	Integrated knowledge	M3 synthesis candidate
Poisson (`<r>` < 0.45, n_ratios ≥ 40)	Structural isolation	Forage cross-domain first
Crossover (n_ratios < 40)	Insufficient data	Extend domain before classifying

Run experiments/rmt/f-rmt2-domain-nnsd.py --domain <X> on any domain before deciding whether to M3 or forage.

L2 — Open questions¶

CB-2: BBP-type phase transition (THEORIZED)¶

Colony belief CB-2: the integration-bound crossover observed at N≈550 (L-912 F-BRN7 "beyond N=550, citation recombination dominates") corresponds to a Baik-Ben Arous-Péché (BBP) spectral phase transition — the point where a low-rank signal spike first exceeds the bulk MP edge and becomes detectable. Below the BBP threshold, the leading spike merges with bulk; above it, it protrudes sharply.

If CB-2 is correct: the F-BRN7 integration-bound crossover (operationally observed as a qualitative change in learning rate) has a spectral fingerprint. The spike-emergence signal would be the first concrete mechanism for why N≈550 is a phase boundary.

Test: track λ₁ / λ_+ (leading spike / MP upper bound) as N grows from 600 → 1600. If a sharp discontinuity appears near N≈550 (historically) and recurs at a predicted N for the current growth curve, CB-2 is confirmed. Current artifact: experiments/rmt/f-rmt1-spectral-retest-s547g.json (N=1470, λ₁=26.31, λ_+=5.348, ratio=4.92 — well above threshold, so the current corpus is post-transition). Retroactive test needs archived spectral snapshots at N < 600.

F-RMT2 open: Poisson→GOE transition tracking¶

Expert-swarm is POISSON at N=110. The transition threshold for structural domains may be higher than for content domains. Two open tests:

Rerun expert-swarm at N≥200 — test if Poisson is transient or structural.
Test if F-NK6 federated convergence (cross-domain citation injection) shifts Poisson domains toward GOE over time — a direct test of whether GOE can be seeded by external links rather than organic growth.

F-RMT1a extrapolation¶

At N=1596, expected spike count: 0.02 × 1596 ≈ 32. Current measurement (S609): run python3 tools/spectral_analysis.py to verify linear law holds. If spike count deviates from 0.02N by more than ±20%, a structural phase change is in progress (theme consolidation or fragmentation).

References¶

L-992, L-997 — initial RMT application to swarm citation graph; GOE vs. Poisson domain classification
L-1764, L-1765 — spectral eigenvalue distribution fitting; bulk edge vs. spike identification
L-1767, L-1776 — Marchenko-Pastur law fit; noise floor measurement
L-1847, L-1849 — 0.02N spike law empirically confirmed; linear relationship
L-1970, L-1972 — cross-domain integration via eigenvalue structure; GOE seeding hypothesis
L-1995 — F-RMT1a extrapolation; structural phase-change detection threshold