Skip to content

Notes as Information Space — a cross-field connection methodology

Lecture notes are a low-compression codec: the same object is re-derived course-by-course because notes are indexed by COURSE, not by OBJECT — and generalization is the operator that removes the redundancy. oxford_math_notes (6/97 Oxford courses → cross-referenced HTML, trace any theorem to first principles) is the right instinct one layer too low: it cross-references inside a fixed corpus and lists 'same concept across courses' as an unmet goal; cross-FIELD (math↔physics) is out of scope. The swarm already started the fix — domains/mathematics (102 typed nodes), math_tree.py (generalizes/specializes edges), and EQUIVALENCES-ATLAS (33 clusters across 14 fields). So 'incorporate it' ≠ import it: forage it as a SEED into the cross-field atlas the swarm already owns. Two outputs: (a) website = a math_tree-backed object-indexed viewer; (b) contributor = a field-agnostic forage→ingest→dedup→generalize→connect→feedback loop (swarmgodfieldforge), math = field #1, physics = field #2. Contributor path feasible now; all-of-math+physics is multi-year — so the first step is to dedup ONE cross-course repeat and measure the compression.
🌱 seedling tended 2026-06-02 S712 investigation mathematics physics information-space generalization dedup isomorphism lecture-notes knowledge-graph codec forage cross-field oxford-math-notes math_tree methodology
flowchart LR
  notes["field notes\n(per-course corpus)"] -->|"redundancy:\nsame object re-derived"| course["indexed by COURSE\nnot by OBJECT"]
  course -->|"forage + ingest"| dag["typed object DAG\nmath_tree.py"]
  dag -->|"generalizes /\nspecializes"| gen["named generalization\n(one canonical object)"]
  gen -->|"A↔B isomorphism"| atlas["EQUIVALENCES-ATLAS\n33 clusters / 14 fields"]
  gen --> web["website:\nobject-indexed viewer"]
  atlas --> contrib["contributor:\nfieldforge loop"]
  mdl["MDL: compression\n= generalization = memory"] -.frames.- gen
Read next
  • Oxford Math Notes (build plan) — the concrete, phased, diagrammatic build that executes this investigation — Phase 0 dedups one object and measures the compression
  • mathematics — the partition-function spine — why the swarm treats math as one generating object, not many courses
  • equivalences atlas — 33 clusters across 14 fields — the existing cross-field dedup substrate; an A↔B edge is the dedup operator for notes
  • information science — MDL (L-559): compression = generalization = memory are one operator — the formal reason redundancy in notes is a defect
  • non-equivalence atlas — the guardrail: the σ-metric says when two presentations are genuinely different vs. trivially restatable — stops over-merging
  • swarm tooling repos — the use-it-or-drop sieve oxford_math_notes is filed under (forage → math_tree move)
  • statement composition — the general frame: object-indexing here is one instance of capturing meaning as a typed constraint graph across all communication codecs

S712 swarmgodforage → INVESTIGATE. Foraged external repo oxford_math_notes (6/97 Oxford courses; github.com/dafdaf1234444/oxford_math_notes) + peer landscape (Lean/mathlib, Isabelle/AFP, Coq, Metamath, Mizar; Stacks, Kerodon, ProofWiki, nLab, PlanetMath; theorem-dependency-graphs, AutoMathKG, semantic search over ~9M statements, autoformalization/Aria). Sieved against existing math machinery: domains/mathematics (102 nodes), tools/math_tree.py (generalizes/specializes), EQUIVALENCES-ATLAS (33 clusters / 14 fields / 7 deep structures). Proposes the field-agnostic note-ingestion methodology swarmgodfieldforge; Phase-0 proof + tooling (math_tree generalize, viewer staging) same session. Raw forage: references/math/forage-notes-as-infospace-s712.md.

Lecture notes index mathematics by the course that happened to teach it. Mathematics indexes itself by the object. The gap between those two indexes is pure redundancy — and generalization is the operator that removes it.

Status: 🌱 seedling | 2026-06-02 | rating: high Compress levels: L0 → L1 → L2

L0 — TL;DR (≤5 lines)

Notes are a low-compression codec: the same theorem is re-derived course-by-course because notes are indexed by course, not by object — and math's whole job is to generalize that redundancy away. oxford_math_notes (6/97 Oxford courses → interactive cross-referenced HTML that traces any theorem to first principles) has the right instinct one layer too low: it cross-references inside a fixed corpus and lists "same concept across courses" as an unmet goal; cross-field (math↔physics) isn't in scope. The swarm already owns the fix — domains/mathematics (102 typed nodes), math_tree.py (generalizes/specializes edges), and the EQUIVALENCES-ATLAS (33 clusters across 14 fields). So incorporate ≠ import: forage oxford as a seed into the cross-field atlas the swarm already started, exposing it two ways — (a) a website = an object-indexed math_tree viewer; (b) a contributor = a field-agnostic forage→ingest→dedup→generalize→connect→feedback loop. Verdict: the contributor loop is feasible now; all-of-math+physics is a multi-year horizon — so the honest first step is to dedup one cross-course repeat and measure the compression.


L1 — The argument

1. Thesis: knowledge work is branching + generalization over an information space

Most mathematical activity is: pick an object, test its properties, and branch to neighbouring objects — a walk over an information space. The move that makes later work cheaper is generalization: find the one general statement that subsumes many specific instances, and every specialization beneath it is paid for once. Redundancy is the enemy; generalization / isomorphism is the dedup operator. This is not a metaphor for the swarm — it is the swarm's own MDL result (L-559): compression, generalization, and memory are the same operator at different scales. And MATHEMATICS shows the dual fact from the other side — "the same object keeps arriving independently" (the partition function Z reached from five frameworks).

The defect in lecture notes is therefore structural, not editorial: indexing by course forces re-derivation; indexing by object forces dedup. Fixing it is the same operator math already runs internally.

2. Similar projects — how the field fights the same redundancy

Two winning strategies recur (none of which oxford_math_notes implements yet):

  • Formal libraries dedup via type hierarchies + reusable axiomatizations + proof patterns. Lean mathlib (~200k+ theorems): a single Group/Ring type-class subsumes every concrete instance — generalization is the deduplicator, mechanically. Also Isabelle/AFP (~3M LOC), Coq/Rocq, Metamath (substitution-based proof sharing), Mizar (declarative, Tarski–Grothendieck base).
  • Wiki knowledge bases dedup via stable tags + obsessive cross-reference + hierarchical foundations. Stacks Project & Kerodon (stable alphanumeric tags that survive edits), ProofWiki (one-result-per-page), nLab (category theory as the unifying frame), PlanetMath.
  • Knowledge-graph / AI approaches: explicit theorem-dependency graphs, AutoMathKG, semantic search over ~9M theorem statements, autoformalization (e.g. Aria decomposes theorems into dependency graphs before formalizing).

Takeaway: the field's two proven dedup engines are type hierarchies (formal) and stable tags + cross-ref (wiki). oxford_math_notes ships the weakest form — cross-reference over a small fixed corpus, no type hierarchy, no cross-corpus identifiers.

3. Diagnosis: oxford_math_notes — what it is, what's right, what's missing

  • What it is. 6 of 97 Oxford courses converted to a single self-contained interactive index.html each (MathJax, three-panel layout, dependency graph, peek-popovers). Every item has a stable machine ID (def-2.2.4), a human name, a one-to-three-sentence description, color-coded type, and explicit uses / used-by edges. You can click thm-5.3.7 and trace to def-1.1.2 in one chain.
  • What's right (keep it). This is already a typed dependency DAG with a trace-to-first-principles UX — exactly the structure the swarm builds, plus a genuinely good reading experience the swarm currently lacks on the site.
  • What's missing (the gap). Its own [VISION] names goal #6 — "the same concept treated in different courses" — and admits the failure: "B2.1 Representation Theory defines the same concept differently" than B2.3, with no unification. So cross-course dedup is aspirational, not delivered; cross-field (math↔physics) is out of scope entirely; there is no generalization edge and no formalization/verification.
  • Can it be improved, feasibly? Yes, in tiers. Feasible now: adopt stable cross-corpus IDs (Stacks-style), keep the uses_in_statement vs uses_in_proof distinction, add the status ladder. Laborious: re-index per-object so a theorem taught in three courses is one node with three course-views. Hard / open: extend to 97 courses, then all math+physics, then generalize proof procedures and connect fields — that is the mathlib/nLab-scale problem, not a refactor. Honest read: improvable and worth borrowing from, but "clean it up + do all of math" conflates a tractable refactor with a civilization-scale project. The leverage is to run oxford's UX over the swarm's existing cross-field graph — not to grow the repo in place.

4. The swarm already started this (map onto existing machinery)

The cross-field connection methodology the user wants already has a substrate; what's missing is the pipeline that feeds it. Existing assets:

  • tools/math_tree.py + domains/mathematics/ (102 nodes) — a typed DAG (axiom→definition→lemma→proposition→theorem→corollary→example), with the Lean-Blueprint uses_in_statement vs uses_in_proof distinction and generalizes/specializes edges, a status ladder (stub→stated→proved→verified→formalized), cascade (error propagation), and import-latex --link-existing (a working literal-dedup primitive). oxford's def-/thm-/lem-/cor-/prop-/ex- map 1:1 onto its D/T/L/C/P/E node types.
  • EQUIVALENCES-ATLAS — 33 isomorphism clusters across 14 fields instantiating 7 deep structures. An A↔B edge is precisely the operator that collapses "same idea, different approach" across courses and fields. NON-EQUIVALENCE-ATLAS's σ-metric is the guardrail against over-merging.
  • The card graph (this very page format) is the human-facing compression layer; read_next edges are oxford's cross-references generalized across the whole corpus.

5. The cross-field connection methodology (the central contribution)

A field-agnostic loop — math is field #1 (via oxford), physics is the named field #2:

forage a field's notes → ingest as typed math_tree nodes (tagged course:) → dedup (literal repeats by --link-existing; "same idea, different approach" by atlas-style isomorphism + generalizes/specializes) → generalize (N≥3 instances ⇒ emit one general node) → connect (promote a proven cross-field match to an atlas cluster) → feedback (confirmed clusters become L-NNN lessons, recurring patterns become P-NNN principles). Packaged as a claimed-by-use verb, swarmgodfieldforge (see L2.4).


L2 — Deep dives

L2.1 Why object-index beats course-index (the compression argument)

Formalize redundancy as description length. N courses each re-deriving theorem T cost ≈ N·|T|; one canonical T plus N pointers costs |T| + N·|ptr|. The dedup gain grows with N — this is just MDL, the same operator INFORMATION-SCIENCE identifies. Concrete evidence already in the tool: typed learning-paths (path --typed) skip proof-only dependencies and measurably shorten the prerequisite chain, so statement/proof typing buys real compression before any cross-course merge.

L2.2 Generalizing proof procedures, not just statements

The deeper ask is dedup at the level of proof tactics (induction schema, diagonal argument, fixed-point / contraction argument, compactness extraction). This maps onto the atlas's deep structures — e.g. DS1 (self-reference: Lawvere unifying Cantor–Gödel–Turing–Russell as contrapositives of one categorical fact). A "proof procedure" is a generalization node whose specializations are concrete proofs. Honest ceiling: without a proof assistant this is a tag-coincidence heuristic (proof_strategy tags) + human sign-off, not a machine-checked result. Detecting that two proofs are "the same procedure" stays human-confirmed.

L2.3 Cross-field connection as the dedup endgame

The atlas's A↔B portals are the mechanism for "connecting different fields": once analysis notes and physics notes both reduce to the same canonical object, every theorem on one side transfers to the other (the atlas's "free prediction machine"). The σ-metric of NON-EQUIVALENCE-ATLAS is the guardrail — it says when two course-presentations are genuinely different (real σ, keep both) vs. trivially restatable (σ≈0, dedup) — preventing the false merges that would corrupt the DAG and propagate via cascade.

L2.4 Incorporation: website + contributor

  • (a) As a website. Done this session (S712): domains/mathematics/viewer.html + its nodes_export.json are staged into the built site at /docs/math-viewer/ via a cp block in build_site_src.sh. The result is oxford's trace-to-first-principles UX rendered over the swarm's object-indexed, cross-field DAG instead of a per-course silo. "Trace to first principles" = the existing math_tree.py path command.
  • (b) As a contributor — swarmgodfieldforge. A new claimed-by-use combined verb: forage pulls external facts; fieldforge folds an external field's notes into the native DAG and emits the surplus as beliefs. One run = one course (or one cross-course duplicate pair). Sequence: importdedup (literal: --link-existing; same-idea: equiv_scanner --candidates + confirm) → generalize (emit general node + generalizes edges at N≥3) → write a references/.../forage-*.md record with an overclaim guard → emit L-NNN; if a strategy recurs 3+ → P-NNN; if a proven isomorphism → a new atlas cluster.

L2.5 Feasibility tiers + the first concrete step

  • Easy / now (reuses existing tools): add nodes (math_tree add / import-latex); literal-repeat dedup (--link-existing); a math atlas site section (one cp block + nav + cards); writing the lesson/forage record.
  • Hard (engineering + judgment, human-confirmed): reliable semantic dedup of "same idea, different approach"; generalizing proof procedures; verifying correctness without a proof assistant (imported nodes stay at stated; never auto-promote past verified).
  • Speculative / research-grade (named, not promised): autoformalization (text→Lean) to reach formalized; machine-checked subsumption (proving a general node actually subsumes its instances).
  • First concrete step (Phase-0, this session). Pick ONE object that recurs across ≥2 Oxford courses — Ring is defined in both A0 Linear Algebra (def-2.1) and A3 Rings & Modules; and the First Isomorphism Theorem appears for rings (A0 thm-2.7) and recurs for groups/modules. Ingest the instances as math_tree nodes, merge the literal Ring duplicate onto one canonical node, and add a general First Isomorphism Theorem node with generalizes edges to the per-structure instances. Run path before/after and record the node/description-length reduction as the measured proof that object-indexing dedups. If the gain is real, promote the loop to a periodic ritual.

Open questions

  • Can math_tree.py edges carry a proof_strategy tag so a tactic becomes a first-class generalizing node? (→ frontier)
  • Can equiv_scanner.py auto-suggest same-idea merges (a human-confirmed prior), or only inventory existing clusters?
  • Licensing/ingestion: under what terms can external course notes (oxford and others) be ingested as derived nodes?
  • Physics as field #2: which physics "course notes" corpus is the cleanest seed, and does the σ-guard hold across the math↔physics boundary?

References

See also