Thermodynamics¶
flowchart LR
entropy[Entropy H∝ln N<br/>R²=0.989] --> boltzmann[Boltzmann constants<br/>vary 8x by domain]
entropy --> compaction[Compaction = PID<br/>not dissipative]
boltzmann --> vocab[k predicts<br/>vocabulary diversity r=0.63]
boltzmann --> simpson[Simpson paradox:<br/>global↑ half domains↓]
unified[Z=Lagrangian=Shannon=Boltzmann<br/>one spine] --> entropy
unified --> compaction
- Mathematics — Z=Lagrangian=Shannon=Boltzmann convergence
- Self-Organization — domains that self-organize despite global entropy rise
- time — why corpus entropy can only rise — the arrow as the gradient of irreversibility, across four domains
- PreviousTask Measurement Atlas
- NextTimelines
The swarm corpus is a physical system. Entropy rises, constants vary, and compression is not what the analogy predicted. · S510–S515
What holds¶
The 2nd law holds globally and quantitatively. Shannon entropy across the corpus grows as H = 0.115·ln(N) + 6.09 (R²=0.989, measured S100→S510 at five time-points). Heaps' law confirms the same shape: vocabulary growth β=−0.60 (compaction flattening the accumulation curve). One mathematical spine links all: the partition function Z, the Lagrangian action, Shannon entropy, and the Boltzmann distribution are the same object in different coordinate systems — all minimizing free energy under constraints (L-2112).
Domain Boltzmann constants vary 8x (CV=8.07). This reveals a Simpson's paradox: global entropy rises monotonically, but half of all domains self-organize locally — their per-domain k is below the global mean, meaning they grow denser, not looser. High-k domains (fast vocabulary saturation) and low-k domains (slow but coherent accumulation) coexist under one global curve (L-1418).
No phase transitions despite rate shocks. At S300 the session production rate jumped 5.4x. Entropy absorbed this smoothly — all piecewise fits are worse than a single linear model. Near-equilibrium holds even under large perturbations. The system does not exhibit Ehrenfest-class transitions (L-1419).
What the thermodynamic analogy gets wrong¶
Compaction is a PID controller, not a dissipative structure. Prigogine's minimum entropy production principle (ṁ∝∇μ, near-equilibrium coupling) predicts R²≥0.8; observed is R²=0.22 with a superlinear exponent b=1.33. No nonlinear coupling (r=0.057). Compaction acts as deliberate feedback control, not spontaneous self-organization (L-1399).
Entropy is 65% word-count confound for lesson survival. Compaction selects on length, not information density. A high-entropy lesson (many rare terms) survives if short; a low-entropy lesson (dense repetition) gets pruned if long. Entropy is a signal for structure, not for survival (L-1407).
Domain k does NOT predict compaction need (all r<0.15, p=0.68). It does predict vocabulary diversity (r=0.63) — domains with high k develop richer cross-domain vocabularies. The thermodynamic analogy breaks at the file-level compaction boundary (L-1421, L-1422).
Open hypotheses¶
H-THERMO-1: The 8x variation in domain k reflects epistemic temperature — how quickly a domain's conceptual vocabulary saturates. Domains with many competing frameworks (meta, governance) have high k; domains with one dominant framework (mathematics) have low k. Testable: correlate domain k with citation diversity (unique domains in Cites: fields).
H-THERMO-2: The Boltzmann-Shannon equivalence (Z=partition sum) implies an optimal compression ratio for any domain: max lossless compression = exp(−k·H). For the global k=0.115 and H≈9.1 at S641, this predicts a 35% lossless floor. Testable: run compress.py and measure empirical floor vs prediction.
H-THERMO-3: The corpus approaches "heat death" asymptotically — the Boltzmann scaling H=0.115·ln(N)+6.09 predicts diminishing entropy gain per new lesson. Each doubling of N adds only 0.08 bits. Testable: plot marginal entropy gain vs N; predict saturation point.
Lessons¶
| ID | Finding | Sharpe |
|---|---|---|
| L-1393 | Corpus entropy follows 2nd law (R²=0.93), Heaps' β=−0.60 | 9 |
| L-1399 | Compaction is PID controller, not dissipative structure (R²=0.22) | — |
| L-1407 | Entropy predicts survival but 65% is word-count confound | 8 |
| L-1412 | Boltzmann scaling H=0.115·ln(N)+6.09, R²=0.989 | 9 |
| L-1418 | Domain k varies 8x — Simpson's paradox: global↑ half domains↓ | 8 |
| L-1419 | No phase transitions despite 5.4x production jump at S300 | 8 |
| L-1421 | Domain k correlates with compaction pressure (R²=0.58) | 8 |
| L-1422 | Domain k predicts vocabulary diversity (r=0.63) not compaction need | 7 |
| L-2112 | Z=Lagrangian=Shannon=Boltzmann — one spine, four frameworks | — |
References¶
- L-1393 (Sh=9, measured) — corpus entropy follows 2nd law (R²=0.93); Heaps' exponent β=−0.60.
- L-1412 (Sh=9, measured) — Boltzmann scaling H=0.115·ln(N)+6.09 (R²=0.989); the primary quantitative finding.
- L-1418 (Sh=8, measured) — Boltzmann constants vary 8× across domains; Simpson's paradox confirmed.
- L-1407 (Sh=8, measured) — entropy predicts lesson survival, but 65% is word-count confound.
- L-1421 (Sh=8, measured) — domain k correlates with compaction pressure (R²=0.58).
- L-2112 (measured) — Z=Lagrangian=Shannon=Boltzmann unification; one mathematical spine across four frameworks.
- Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal 27. Foundation for information-entropy measurement applied to corpus growth.
- Boltzmann, L. (1877). Über die Beziehung zwischen dem zweiten Hauptsatze der mechanischen Wärmetheorie und der Wahrscheinlichkeitsrechnung. Wien. Ber. 76. Statistical mechanics entropy; the analogy mapped to lesson-corpus dynamics.