Mixing as Kernel — the seam¶
flowchart LR
parts[parts · pᵢ ∈ X] --> kernel["K(w, p)"]
weights[weights · w ∈ Δ] --> kernel
carrier[carrier · medium · prior · geometry] --> kernel
kernel --> inside["Ω > 0 · redundant · inside hull"]
kernel --> outside["Ω < 0 · synergistic · outside hull"]
outside --> interesting[interesting: umami×umami · chord · MoE · alloy]
inside --> averaging[averaging: paint mix · BMA · ensemble mean]
- Mixtures — taste & smell — the concrete gastronomic instance of this kernel
- Mixing — generalized — the full kernel zoo across 10 domains
- Isomorphism atlas — mixing kernel as a cross-domain structure
Seam page from swarmgodcomboforage S553: MIXTURES × MIXING-GENERALIZED → the shared kernel structure. Forage record at references/math/forage-mixing-kernel-s553.md confirms three predicates; O-information was the missing formalism. combo.py score: 113 shared salient terms.
- PreviousMind As Waiting Machine
- NextMixing Generalized
Status: budding | 2026-05-19 | swarmgodcomboforage S553 Compress levels: L0 ↓ L1 ↓ L2
The two sides of this seam:
MIXTURESis taste, smell, food — one row in a table.MIXING-GENERALIZEDis all ten rows of that table. This page is the table itself — the abstract claim that both were already making, and what external research says about it.
L0 — TL;DR (≤5 lines)¶
Every combination phenomenon has the same three-knob skeleton:
parts p in a space X, weights w on the simplex, and a
kernel K(w, p) that decides what the mixture is. The kernel either
stays inside the convex hull (additive, linear, boring) or escapes it
(synergistic, interesting). O-information Ω = TC − DTC is the signed
scalar that measures which regime you are in: Ω < 0 = synergy (outside
hull); Ω > 0 = redundancy (inside hull). Non-Euclidean kernels —
Wasserstein, Fisher-Rao, orthogonal — beat Euclidean averaging whenever
parts live in a curved space, proved independently for three regimes
(distributions, model weights, cluster-structured inputs).
L1 — Overview¶
The three-knob model¶
| Knob | What it is | Why it matters |
|---|---|---|
Parts p₁…pₙ ∈ X |
the things being combined | the space X determines what distances and means are valid |
Weights w ∈ Δⁿ⁻¹ |
how much of each | only decisive for linear kernels; non-linear kernels make weight effects input-dependent |
Carrier c |
the medium holding the parts | "silent" but changes the effective kernel (fat in food; prior in Bayes; geometry in model space) |
The kernel zoo (full version in MIXING-GENERALIZED): linear · log-linear · multiplicative-synergistic · masking-saturating · gated-routed · stochastic · reactive · emergent · time-resolved · spatial.
The seam claim¶
The seam between MIXTURES and MIXING-GENERALIZED is not analogy — it is the same math playing in different keys. Each domain is one instantiation of K(w, p) with a domain-specific carrier:
| Domain | Carrier | Hull-escaping kernel |
|---|---|---|
| Taste | fat | umami × umami (8× synergy) |
| Smell | air / receptor space | accord that "smells like neither" |
| Chemistry | solvent | emulsion; reactive (H₂+O₂→H₂O) |
| Color (light) | CIE space | RGB → white |
| Color (paint) | reflectance | many pigments → mud (log-linear, hull inward) |
| ML | weight / distribution space | MoE with sparse gating; mixup |
| Social | influence graph | DeGroot consensus (inside); polarization (outside?) |
The design task is the same in every row: choose K and carrier to engineer Ω toward negative for the desired output, and away from the failure modes (mud, mode collapse, muddled middle).
What O-information adds¶
MIXING-GENERALIZED §7 asks "when does the mixture produce something outside the space of its parts?" without a computable answer. O-information answers it:
where TC = total correlation (redundancy pressure) and DTC = dual total correlation (synergy pressure). Bounoua et al. (2024) give a practical estimator that works on non-Gaussian systems. Sign of Ω determines the regime; magnitude tells you how far from linear you are.
Design rule from the forage (S553): to produce a synergistic mixture (Ω < 0), you need parts whose joint information exceeds the sum of pairwise informations — this is exactly what umami × umami, harmonic chords, and gated MoE routing achieve. Parts with high mutual overlap (high shared information) give Ω > 0 by default; Ω < 0 requires structurally complementary parts.
Non-Euclidean kernels¶
MIXING-GENERALIZED §1 notes the Wasserstein barycenter as an alternative "geometric mean" of distributions without proving it is better. The forage confirms the gradient:
- Fisher-Rao Karcher mean (Wang et al. 2026): avoids representation collapse and activation variance shrinkage that afflict linear weight averaging. The manifold geometry encodes information-theoretic distance.
- Orthogonal manifold merging (Yang et al. 2026): prevents catastrophic forgetting; linear arithmetic merging fails.
- Wasserstein geodesic (Zhu et al. 2023): improves certifiable robustness over linear Mixup. Geometric interpolation > arithmetic interpolation in distribution space.
The pattern: whenever the parts live in a curved space (distributions, probability simplices, model weight manifolds), the flat Euclidean average is wrong by construction. The carrier geometry is load-bearing, not cosmetic.
L2 — Deep dive¶
1. O-information as a mixing instrument¶
Full formalism: let X₁, …, Xₙ be the parts.
TC = ΣᵢH(Xᵢ) − H(X₁,…,Xₙ) [total correlation, ≥0]
DTC = H(X₁,…,Xₙ) − ΣᵢH(Xᵢ|X₋ᵢ) [dual total correlation, ≥0]
Ω = TC − DTC
- Ω > 0: TC dominates — the system is more predictable from parts than the parts are from each other. Redundancy. Mixture inside hull.
- Ω < 0: DTC dominates — the system carries more joint information than any individual part can account for. Synergy. Mixture outside hull.
- Ω = 0: perfectly balanced; GMM with independent components.
SΩI (Bounoua et al. 2024) estimates this without Gaussianity using score functions — applicable to taste data, audio, social networks, model activations.
2. When Euclidean averaging is wrong¶
The Euclidean mean assumes the space is flat. Three classes of failure:
Distributions: the arithmetic mean of two Gaussians N(0,1) and N(10,1) is a bimodal — meaningless as a "typical distribution." The Wasserstein barycenter produces N(5,1), which is the geometric midpoint respecting the metric of distribution space. For any application where "midpoint" should reflect a smooth interpolant (data augmentation, domain adaptation), use the Wasserstein kernel.
Model weights: modern LLMs live on or near low-dimensional manifolds in weight space. Arithmetic weight averaging projects off-manifold → representation collapse (activation variance shrinks, effective rank degrades). Fisher-Rao Karcher mean stays on-manifold by using the information-geometric metric. The cost is an iterative fixed-point solve (vs one-shot averaging) but the quality gain is consistent.
Cluster-structured inputs: MoE provably learns cluster-structured
regression (Kawata et al. 2025) where dense networks fail. The gated
kernel m = Σᵢ gᵢ(x) fᵢ(x) is piecewise — each input routes to its
cluster's specialist. Dense networks are forced to average across clusters;
this is the hull-inside failure in ML. MoE escapes it structurally.
The unified pattern: flat kernels fail when the data manifold is curved; curved kernels succeed; the carrier geometry determines which kernel is appropriate.
3. The carrier as a hidden design variable¶
Both source pages note the carrier changes the effective kernel. The forage adds a formal consequence: the carrier geometry determines what "mixing" even means. You cannot choose K independently of the space X that the parts live in.
| Carrier geometry | Correct kernel | Wrong kernel |
|---|---|---|
| Euclidean (vector space) | arithmetic mean | — |
| Riemannian (smooth manifold) | Riemannian/Karcher mean | arithmetic mean |
| Probability simplex | Wasserstein barycenter or log-linear pool | arithmetic mix |
| Discrete (grammar, graph) | constrained mixture (code-switching, DeGroot) | unconstrained average |
The "carrier mismatch" failure mode in MIXING-GENERALIZED §8 is now precisely: using a flat kernel in a curved carrier.
4. What remains open¶
Why ~3 dominant components? MIXING-GENERALIZED §10 observes that "good" mixtures across domains tend to have ~3 dominant components. No formal grounding found in the forage. Candidate: Miller/Cowan working memory bounds × readout channel capacity. Not grounded here — one circuit to close in a future session.
Reactive mixing in social systems. MIXING-GENERALIZED §10 asks: what is the social analogue of H₂+O₂→H₂O? The forage found no paper on this. Schelling tipping / Granovetter threshold models are candidates.
Wasserstein mean of perceptual spaces. Does the Wasserstein mean of two smells produce a more natural intermediate odor than the arithmetic mean of their receptor activation vectors? Open POM allows this experiment (Lee et al. 2023, Science). Not done.
References (forage additions)¶
Full taste/smell/chemistry references are in MIXTURES;
full ML/math references are in MIXING-GENERALIZED.
- Bounoua, M., Franzese, G., & Michiardi, P. (2024). SΩI: Score-based O-information estimation. arXiv:2402.05667. — O-information as the synergy/redundancy scalar for mixing.
- Wang, J., Ye, Z., & Yin, W. (2026). Functionality-oriented LLM merging on the Fisher–Rao manifold. arXiv:2603.04972. — Non-Euclidean model mixing beats Euclidean.
- Yang, S., Shi, K., & Liu, W. (2026). Orthogonal model merging. arXiv:2602.05943. — Riemannian orthogonal merging prevents forgetting.
- Zhu, J., et al. (2023). Interpolation for robust learning: data augmentation on geodesics. arXiv:2302.02092. — Wasserstein geodesic > linear Mixup for robustness.
- Kawata, R., et al. (2025). Mixture of experts provably detect and learn the latent cluster structure. arXiv:2506.01656. — MoE hull-escape is structurally necessary for cluster data.
- Liu, H., et al. (2023). Dataset distillation via the Wasserstein metric. arXiv:2311.18531. — Wasserstein barycenter as distribution-space mean.
See also¶
MIXTURES— the gastronomic and olfactory side of this seam (the concrete K(w,p) instances in taste, smell, food).MIXING-GENERALIZED— the full kernel zoo across 10 domains; this page is its formal backbone.../ISOMORPHISM-ATLAS.md— mixing kernel as a cross-domain isomorphism candidate.UNIVERSE-EVOLUTION-AS-COMPRESSION— mixing entropy as one face of universal compression.
Inspiration sources¶
- MIXTURES.md and MIXING-GENERALIZED.md — the two source pages whose 113 shared salient terms (combo.py S553) surfaced this seam.
- O-information literature (Timme et al. 2014; Williams & Beer 2010 PID; Bounoua et al. 2024) — the information-theoretic backbone.
- The model-merging literature (2021–2026) — independent confirmation that the kernel choice is non-optional in curved spaces.