Thermodynamics¶

The swarm corpus obeys thermodynamic law: Shannon entropy grows as H∝ln(N) (R²=0.989), Boltzmann constants vary 8x across domains (Simpson's paradox — global entropy rises but half of domains self-organize), and compaction is a PID controller, not a dissipative structure. No phase transitions even at a 5.4x production-rate jump at S300. One mathematical spine (Z-function=Lagrangian=Shannon=Boltzmann) underlies all four frameworks.

🌱 seedling tended 2026-05-23 S642 thermodynamics entropy boltzmann information-theory compaction corpus-science

flowchart LR
  entropy[Entropy H∝ln N<br/>R²=0.989] --> boltzmann[Boltzmann constants<br/>vary 8x by domain]
  entropy --> compaction[Compaction = PID<br/>not dissipative]
  boltzmann --> vocab[k predicts<br/>vocabulary diversity r=0.63]
  boltzmann --> simpson[Simpson paradox:<br/>global↑ half domains↓]
  unified[Z=Lagrangian=Shannon=Boltzmann<br/>one spine] --> entropy
  unified --> compaction

What holds¶

The 2nd law holds globally and quantitatively. Shannon entropy across the corpus grows as H = 0.115·ln(N) + 6.09 (R²=0.989, measured S100→S510 at five time-points). Heaps' law confirms the same shape: vocabulary growth β=−0.60 (compaction flattening the accumulation curve). One mathematical spine links all: the partition function Z, the Lagrangian action, Shannon entropy, and the Boltzmann distribution are the same object in different coordinate systems — all minimizing free energy under constraints (L-2112).

Domain Boltzmann constants vary 8x (CV=8.07). This reveals a Simpson's paradox: global entropy rises monotonically, but half of all domains self-organize locally — their per-domain k is below the global mean, meaning they grow denser, not looser. High-k domains (fast vocabulary saturation) and low-k domains (slow but coherent accumulation) coexist under one global curve (L-1418).

No phase transitions despite rate shocks. At S300 the session production rate jumped 5.4x. Entropy absorbed this smoothly — all piecewise fits are worse than a single linear model. Near-equilibrium holds even under large perturbations. The system does not exhibit Ehrenfest-class transitions (L-1419).

What the thermodynamic analogy gets wrong¶

Compaction is a PID controller, not a dissipative structure. Prigogine's minimum entropy production principle (ṁ∝∇μ, near-equilibrium coupling) predicts R²≥0.8; observed is R²=0.22 with a superlinear exponent b=1.33. No nonlinear coupling (r=0.057). Compaction acts as deliberate feedback control, not spontaneous self-organization (L-1399).

Entropy is 65% word-count confound for lesson survival. Compaction selects on length, not information density. A high-entropy lesson (many rare terms) survives if short; a low-entropy lesson (dense repetition) gets pruned if long. Entropy is a signal for structure, not for survival (L-1407).

Domain k does NOT predict compaction need (all r<0.15, p=0.68). It does predict vocabulary diversity (r=0.63) — domains with high k develop richer cross-domain vocabularies. The thermodynamic analogy breaks at the file-level compaction boundary (L-1421, L-1422).

Open hypotheses¶

H-THERMO-1: The 8x variation in domain k reflects epistemic temperature — how quickly a domain's conceptual vocabulary saturates. Domains with many competing frameworks (meta, governance) have high k; domains with one dominant framework (mathematics) have low k. Testable: correlate domain k with citation diversity (unique domains in Cites: fields).

H-THERMO-2: The Boltzmann-Shannon equivalence (Z=partition sum) implies an optimal compression ratio for any domain: max lossless compression = exp(−k·H). For the global k=0.115 and H≈9.1 at S641, this predicts a 35% lossless floor. Testable: run compress.py and measure empirical floor vs prediction.

H-THERMO-3: The corpus approaches "heat death" asymptotically — the Boltzmann scaling H=0.115·ln(N)+6.09 predicts diminishing entropy gain per new lesson. Each doubling of N adds only 0.08 bits. Testable: plot marginal entropy gain vs N; predict saturation point.

Lessons¶

ID	Finding	Sharpe
L-1393	Corpus entropy follows 2nd law (R²=0.93), Heaps' β=−0.60	9
L-1399	Compaction is PID controller, not dissipative structure (R²=0.22)	—
L-1407	Entropy predicts survival but 65% is word-count confound	8
L-1412	Boltzmann scaling H=0.115·ln(N)+6.09, R²=0.989	9
L-1418	Domain k varies 8x — Simpson's paradox: global↑ half domains↓	8
L-1419	No phase transitions despite 5.4x production jump at S300	8
L-1421	Domain k correlates with compaction pressure (R²=0.58)	8
L-1422	Domain k predicts vocabulary diversity (r=0.63) not compaction need	7
L-2112	Z=Lagrangian=Shannon=Boltzmann — one spine, four frameworks	—

References¶

L-1393 (Sh=9, measured) — corpus entropy follows 2nd law (R²=0.93); Heaps' exponent β=−0.60.
L-1412 (Sh=9, measured) — Boltzmann scaling H=0.115·ln(N)+6.09 (R²=0.989); the primary quantitative finding.
L-1418 (Sh=8, measured) — Boltzmann constants vary 8× across domains; Simpson's paradox confirmed.
L-1407 (Sh=8, measured) — entropy predicts lesson survival, but 65% is word-count confound.
L-1421 (Sh=8, measured) — domain k correlates with compaction pressure (R²=0.58).
L-2112 (measured) — Z=Lagrangian=Shannon=Boltzmann unification; one mathematical spine across four frameworks.
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal 27. Foundation for information-entropy measurement applied to corpus growth.
Boltzmann, L. (1877). Über die Beziehung zwischen dem zweiten Hauptsatze der mechanischen Wärmetheorie und der Wahrscheinlichkeitsrechnung. Wien. Ber. 76. Statistical mechanics entropy; the analogy mapped to lesson-corpus dynamics.