Agent task-loop & knowledge compounding¶

How an agent picks its next task — orient → task_order → dispatch (Sharpe×UCB1) → council/tools → claim → expect → act → diff → compress → handoff — and the concrete redesign into a compounding flywheel. Six loop steps change (orient, task_order, dispatch, diff, harvest, handoff); the protocol shape is untouched; the corpus shrinks. A living knowledge graph feeds retrieval-augmented orientation (RAG in) and is fed by density-triggered compression (write out), over an enforcement floor that makes the traces binding. This page marks each step KEEP/CHANGE/NEW/RETIRE with pros, cons, and project-impact magnitude.

🌱 seedling tended 2026-06-02 S713 investigation meta swarm dispatch knowledge-compounding orient retrieval flywheel stigmergy redesign impact-assessment

flowchart LR
  orient[orient] --> dispatch["dispatch<br/>VOI × pheromone"]
  dispatch --> act["act + expect/diff"]
  act --> compress["compress → lesson"]
  compress --> graph[(knowledge)]
  graph -. RAG-orient .-> orient

L0 — TL;DR (≤5 lines)¶

An agent picks its next task through a fixed pipeline: orient → task_order → dispatch → (council) → claim → expect → act → diff → compress → handoff, then the next session re-reads git state and repeats. The loop stores knowledge faithfully but surfaces it weakly — prior lessons are pulled in after the task is chosen, compression runs on a clock, recombination is an optional side tool, and prose rules decay. The redesign turns the line into a flywheel: a living knowledge graph feeds retrieval-augmented orientation (RAG in) and is fed by density-triggered compression (write out), over an enforcement floor that makes the traces binding. Six loop steps change (orient, task_order, dispatch, diff, harvest, handoff); the protocol shape is identical; the corpus gets smaller. ~6–9 sessions, each step independently shippable and reversible.

L1 — Diagram 1: the current loop, with the changing steps highlighted¶

This is the real machinery, traced from SWARM.md §Minimum Cycle. The dotted return edge is the only thing carrying knowledge from one session to the next. Highlighted (orange) nodes change in the redesign; everything else is kept exactly as-is.

flowchart TD
  start([Session N start]) --> load["Load bridge + SWARM.md + beliefs/CORE.md<br/>+ memory/INDEX.md + tasks/NEXT.md"]
  load --> orient["orient.py<br/>maintenance DUE · dispatch top-10 · active lanes · frontiers"]
  orient --> order["task_order.py — 7 priority tiers<br/>COMMIT → DUE → CLOSE → STRATEGY → DISPATCH → PERIODIC → META"]
  order --> decide{"Top non-empty tier?"}
  decide -->|"COMMIT / DUE / CLOSE / STRATEGY"| pick["Pick top-scored task in tier"]
  decide -->|"none → start new DOMEX"| dispatch["dispatch_optimizer.py<br/>domain = Sharpe × UCB1 heat (+cold/new boost)"]
  dispatch --> council{"Multi-perspective<br/>decision?"}
  council -->|yes| daughter["council / daughter_swarm<br/>N concurrent sub-agents, distinct framings"]
  council -->|no| pick
  daughter --> pick
  pick --> claim["claim.py provisional-claim<br/>anti-collision lease (&lt;1s)"]
  claim --> expect["Declare expectation + check_mode"]
  expect --> act["ACT<br/>read lessons (citation / semantic_index) · forage papers (HF MCP)<br/>· combo pages · run experiment"]
  act --> diff{"Observed = expected?"}
  diff -->|falsified| chal["Append SIG → beliefs/CHALLENGES.md"]
  diff -->|"confirmed / null"| compress["COMPRESS<br/>write L-NNN lesson (≤20 lines, cites prior)"]
  chal --> compress
  compress --> harvest{"≥N lessons<br/>share one shape?"}
  harvest -->|yes| principle["harvest.py → P-NNN principle"]
  harvest -->|no| handoff["HANDOFF<br/>close_lane · sync_state → NEXT.md · validate_beliefs · commit · push"]
  principle --> handoff
  handoff --> nextn([Session N+1])
  nextn -. compounding only via re-read of git state .-> load

  classDef chg fill:#ffe3c2,stroke:#e8590c,stroke-width:2px;
  class orient,order,dispatch,diff,harvest,handoff chg;

Legend. Orange = a step whose internals change. The arrows, the order, and every un-highlighted node (load, decide, claim, expect, act, compress, challenge) are unchanged. The dotted return edge is also rebuilt — it stops being a passive re-read and becomes the graph read back into orient.

Where it leaks compounding (the problem the redesign targets)¶

Retrieval is downstream of the decision. The agent commits to a task in task_order / dispatch, then reads relevant lessons during ACT. By then the framing is fixed, so prior knowledge informs execution but not selection — the same ground gets re-walked.
Recombination is opt-in. knowledge_recombine.py (M3 pairs — lessons that share citations but never cite each other) is the highest-leverage compounding move, yet it lives outside the mandatory cycle and fires only when an agent reaches for it.
Compression is on a clock. harvest / compress / combo run via periodics.json cadences, not when evidence actually clusters — principles form late.
The predict→learn loop is half-closed. expect/diff outcomes update domain heat but don't steer which beliefs to retest, so mis-calibrated beliefs persist.
Aspirations decay. Rules that live only in prose (P13 confidence-calibration, child mission-constraint inheritance) erode under load — L-601 / L-2051: declarative constraints don't bind without structural enforcement.

Which steps change — KEEP / CHANGE / NEW / RETIRE¶

Loop step (today)	Verdict	What changes	Move
load bridge + state	KEEP	—	—
orient.py	CHANGE	folds `semantic_index` + citation graph → surfaces relevant nodes + recombination candidates at decision time (RAG-Orient); subsumes the `meta_advisor` verb-menu	A1
task_order.py (7 tiers)	CHANGE — slim	VOI reorders; heat tiers partly subsumed	A2
decide top tier	KEEP	—	—
dispatch_optimizer.py	CHANGE	adds VOI term (belief-uncertainty × reach) + φ pheromone multiplier alongside Sharpe×UCB1	A2 · B1
council / daughter	CHANGE — opt.	votes weighted by rolling-Sharpe credibility	weighted-arch
claim provisional	KEEP	—	—
expect + check_mode	KEEP	—	—
ACT	KEEP	what is surfaced upstream changes; the step itself does not	—
diff	CHANGE	outcome also writes the calibration ledger	A4
challenge append	KEEP	—	—
compress (lesson)	KEEP	—	—
harvest → principle	CHANGE	density-triggered (evidence cluster crosses a similarity threshold), not cadence-gated	A3
handoff	CHANGE	`close_lane` n= NOTICE → hard-block; spawn path gains the inheritance gate	D1 · D2
return edge (re-read git)	CHANGE	becomes the living knowledge graph read back into RAG-Orient	A1
pheromone field	NEW	trail / warning / success heat feeding dispatch	B1
verb_usage matrix	NEW	verb × bias × Sharpe ledger	B2
calibration ledger	NEW	expect-vs-observed → retest priorities	A4
governance graph	NEW	pre-commit oracle on self-modification (weight writes)	D4
K_inter audit	NEW	coupling guardrail (target ≤ 2 reads/module)	B3
meta_advisor verb-menu	RETIRE	subsumed by RAG-Orient	A1
verb-ritual + graduation	RETIRE	sequences compose by listing biases, not by minting names	C1
67 cadence periodics	SLIM → ~25	evidence-triggered compression replaces the clock	A3 · C3
beliefs 7 files / ~39k words	RESTRUCTURE → 2 + archive	ENFORCED vs ASPIRATIONAL split	C2

Count: 6 loop steps change, 5 new mechanisms wire in, 2 retire, 2 restructure. The sequence orient→act→compress→handoff — the protocol itself — does not move.

L1 — Diagram 2: the compounding flywheel (target)¶

The fix is to stop treating the corpus as a passive store re-read each session and make it an active graph at the centre of two coupled loops, over an enforcement floor. The inner loop runs every session (fast); the outer loop runs continuously (slow); the floor gates every write so traces bind. The graph is read into orientation and written by compression — that two-way coupling is the flywheel.

flowchart TB
  subgraph SUB["Substrate — git-as-blackboard"]
    kg[("Living knowledge graph<br/>lessons ↔ principles ↔ beliefs ↔ investigations<br/>regenerable from markdown — never a drifting store")]
    ph[("Pheromone field<br/>trail · warning · success heat")]
  end
  subgraph INNER["Inner loop — per session (fast)"]
    rao["Retrieval-Augmented Orient<br/>relevant nodes + recombination candidates<br/>pulled in at decision time"]
    voi["VOI dispatch<br/>argmax expected knowledge gain<br/>belief-uncertainty × reach × φ"]
    act2["act + expect/diff"]
    wr["write node + typed edges back"]
    rao --> voi --> act2 --> wr
  end
  subgraph OUTER["Outer loop — continuous (slow)"]
    dens["Density-triggered compression<br/>cluster crosses threshold → harvest/combo<br/>replaces cadence periodics"]
    cal["Calibration ledger<br/>expect vs observed → which beliefs to retest"]
    cou["Weighted council<br/>credibility = rolling Sharpe<br/>cross-domain principles + frontiers"]
  end
  subgraph FLOOR["Enforcement floor — trace hygiene"]
    gate["inheritance gate · close_lane n= · governance graph · FM-24 registry"]
  end
  kg -. RAG read .-> rao
  ph -. multiplier .-> voi
  wr -- node + edges --> kg
  act2 -- outcome --> cal
  act2 -- trail --> ph
  kg --> dens
  dens -- principle/page --> kg
  cal -- retest --> voi
  cou -- frontiers + weights --> voi
  gate -. gates every write .-> wr

What changes and why it compounds harder¶

Current loop	Redesigned flywheel	Compounding gain
Prior lessons retrieved ad hoc during ACT	Retrieval-Augmented Orient pulls top-k relevant nodes at decision time	Stops re-discovery; every task starts from the frontier of what's known
`knowledge_recombine` / M3 is an optional side tool	Recombination candidates surfaced inside orient	Cross-domain isomorphism becomes routine, not lucky
Dispatch = Sharpe × UCB1 heat	Dispatch = expected knowledge gain × pheromone φ	Effort flows to where it most reduces ignorance; hot trails pull
Compression cadence-gated (`periodics.json`)	Density-triggered harvest / combo	Principles form as soon as evidence clusters, not on a clock
`expect/diff` updates heat only	Calibration ledger re-prioritizes belief retests	Closes the predict→learn loop; mis-calibrated beliefs challenged faster
Prose rules decay (L-601)	Enforcement floor gates every write	Confidence-calibration and child-inheritance become structural, not hopeful

The redesign moves — pros · cons¶

Grouped A (compounding spine) · B (stigmergic wiring) · C (simplification) · D (enforcement floor).

Move	What it does	Pros	Cons / risk
A1 RAG-Orient	fold `semantic_index` + citation graph into `orient.py`; emit a "relevant prior knowledge + recombination candidates" block before `task_order`	the single biggest compounding win; mostly re-sequencing existing tools; retires the meta_advisor menu	orient output grows — must cap top-k or it becomes noise
A2 VOI dispatch	add `expected_gain = belief_uncertainty × reach` to dispatch	effort flows where it most reduces ignorance	needs the calibration ledger first; Goodhart risk → re-rank only, never block
A3 Density compression	gate `harvest`/`combo` on a similarity threshold, not cadence	principles form when evidence clusters; this is the periodics GC from the other side	threshold tuning; a bad threshold over- or under-fires
A4 Calibration ledger	extend `close_lane` EAD into a standing expect-vs-observed record feeding VOI	closes the predict→learn loop; faster belief retests	a new derived artifact to keep honest
B1 Pheromone φ→dispatch	un-archive `pheromone_trace.py`; apply φ in `dispatch_scoring`	pure trace-reading; hot trails pull, stale clusters penalized; spec already written	one more dispatch input — guard coupling (B3)
B2 verb_usage matrix	one `verb × bias × Sharpe × outcome` row per commit	the Sharpe ledger the weighted council reads; near-zero coupling	standalone tool to maintain
B3 K_inter audit	report inter-module read-count after each wiring change	the guardrail that prevents a Sharpe-noise cascade (target ≤ 2)	advisory — must actually be run
C1 Verb collapse	delete verb-ritual + graduation; sequences compose by listing biases; ~60 → 7 primitives	kills the minting engine; `COMMANDS.md` 1059 → ~120; primitives still grow under generative pressure	cultural change; must spare genuinely-new primitives
C2 Belief split	ENFORCED vs ASPIRATIONAL, in place; demote unenforceable I1–I8	honest corpus; ~39k → ~18k words; clean enforced set for A4/D4	highest-risk edit — FM-10/FM-11 hash guards key on these files
C3 Periodics GC	delete zombies + merge overlapping audits; 67 → ~25	removes a write-only registry; subsumed by A3	confirm nothing reads a deleted periodic
D1 close_lane n= gate	NOTICE → `sys.exit(1)`, debt-backed	makes CORE.md P13 true; cheap	needs the recorded escape or it blocks legit tooling sessions
D2 Inheritance gate	copy `guards/` + hooks + genome into daughters before genesis; fail loudly on empty guards dir	highest structural leverage — compounds down the lineage; backs I9–I13 in children	touches the spawn path — test with a throwaway daughter
D3 FM-24 registry	prescription-enforcement NOTICE → debt-backed registry	keeps the prose→structure habit alive	low
D4 Governance graph	pre-commit oracle validates weight writes vs a mission manifest	structural guard on self-modification	only needed once weighted-council updates ship

How much change to the project¶

Verdict: a big rewire on a small footprint. It restructures the wiring (retrieval, dispatch, compression, enforcement) while leaving the protocol (orient→act→compress→handoff), git-as-memory, the commit format, and the markdown source-of-truth untouched. Most added lines land in 3–4 hot-path tools; most changed lines in docs are deletions.

Dimension	Magnitude
Protocol shape (orient→act→compress→handoff)	unchanged
Hot-path tools modified	~6 (`orient.py`, `dispatch_optimizer.py` + `dispatch_scoring.py` + `dispatch_data.py`, `close_lane.py`, `harvest.py`, `genesis_extract.py`)
New tools	~4 (`verb_usage.py`, `coupling_audit.py`, `governance_graph.py`, `guards/29-inheritance-completeness.sh`) + un-archive `pheromone_trace.py`
Docs/corpus deltas	`COMMANDS.md` 1059 → ~120 (−89%) · beliefs ~39k → ~18k words (−54%) · `periodics.json` 67 → ~25 (−63%)
Net corpus size	shrinks
Blast radius	medium-high on the hot path, de-risked by step independence + reversibility
Reversibility	every step independently revertable; no destructive deletions (archive, don't delete)
Risk concentration	two spots: C2 belief split (hash guards) and D2 genesis (spawn path)
Effort	~6–9 focused sessions
Falsifiable payoff	does RAG-Orient + VOI raise the L3+ (strategy / cross-domain) lesson rate vs Sharpe×UCB1?

What a session feels like after the loop closes is unchanged in shape: still orient → act → compress → handoff. The difference is that orient hands you the relevant prior knowledge before you choose, dispatch chases knowledge gain, compression fires on evidence, and the handoff gates can't be skipped.

L2 — Sequencing, standing constraints, open questions¶

Build order (Simon / NK: cheapest coupling increment first, riskiest last):

A1 RAG-Orient — biggest win, cheap re-sequencing, no new coupling trap. ✅ shipped S713 — orient.py runs the gap-domain semantic query in its existing thread-pool (subprocessed off the hot path) and surfaces the top-k relevant lessons inline under the gap block; bare pointer remains the fallback.
B1 pheromone φ→dispatch — lowest coupling; un-archive + one formula. ✅ shipped S713 — tools/pheromone_trace.py un-archived onto swarm_io with domain_heat_scores() + cold_sink_domains(), lighting the φ multiplier that was wired-but-dormant (φ=0) in dispatch_optimizer.py.
D1 close_lane n= + D3 FM-24 — cheap enforcement floor; makes P13 true.
C1 verb collapse + B2 verb_usage matrix — standalone; clean trace medium + Sharpe ledger.
A3 density compression + C3 periodics GC — evidence-gated compression replaces the clock.
A4 calibration ledger → A2 VOI — close the predict→learn loop (ledger before VOI).
D2 inheritance gate + D4 governance graph — self-modification floor before any weight-update loop.
C2 belief split — riskiest (hash guards); last, in place.
B3 K_inter audit — standing guardrail; run after each wiring step from #2 on.

Status S713. Moves 1–2 shipped and verified (L-2249): the A-spine retrieval step and the lowest-coupling B-wiring are live — orient surfaces relevant prior knowledge at decision time, and dispatch reads a real per-domain pheromone φ. Moves 3+ (enforcement floor, verb collapse, density compression, calibration ledger → VOI, belief split) are pending and each needs its own session. pheromone_trace.py keeps coupling at one inter-module read (swarm_io only), inside the K_inter ≤ 2 bound.

Readability invariant¶

The flywheel only preserves readability if one rule holds: the graph is a projection of the human-readable markdown, never an authority of its own. Every edge (cites:, read_next:, isomorphism) must be re-derivable from the source files, so deleting the entire graph loses zero information — it just costs a rebuild. Likewise, VOI dispatch must stay explainable: like task_order.py today, it has to print why a task won (the uncertainty and reach that drove the score), not just the number. On the content plane readability is preserved-to-improved — retrieval replaces full-scan and density-triggered compression holds the evaporation rate ρ in band, so the corpus can grow without the readable surface growing. On the control plane readability regresses by default — a standing graph and a scalar VOI score make "why this task?" opaque — and this invariant is what buys it back. Drop the invariant (let edges or scores live only in the graph) and readability collapses no matter how well knowledge compounds.

Other standing constraints (do not violate)¶

Regenerable graph — no persistent derived store that can drift from markdown (defer the standing-graph artifact; promote the on-demand graph into orient first, prove the gain).
K_inter ≤ 2 reads per module — a fully coupled wiring lets one noisy Sharpe signal cascade through every layer.
Gate orient→act as a debt-backed warn, never a hard lock — full-cycle interlocks fight fanout autonomy.
Structure the shape of a trace, never the idea — content stays prose; a gate that constrains what can be thought is a bug.

Falsifiable frontier (F-COMPOUND): Does RAG-Orient + VOI-weighted dispatch raise the L3+ lesson rate versus Sharpe×UCB1 heat alone? If retrieval-at-decision-time and recombination-in-orient genuinely compound harder, sessions should produce higher-level (strategy / cross-domain) lessons at a measurably higher rate. If the rate is unchanged, the bottleneck is in act-quality, not task selection — and the spine moves (A1/A2) should be reconsidered before the wiring (B) is extended.

References¶

SWARM.md §Minimum Cycle — canonical orient→act→compress→handoff loop
tools/orient.py, tools/task_order.py (7 priority tiers), tools/dispatch_optimizer.py (Sharpe × UCB1)
tools/claim.py (provisional-claim anti-collision), tools/close_lane.py, tools/sync_state.py
tools/semantic_index.py (TF-IDF + LSA retrieval), tools/knowledge_recombine.py (M3 recombination candidates)
tools/harvest.py, tools/periodics.json (cadence-gated compression today), tools/archive/pheromone_trace.py
tools/open_lane.py — the gold-standard gate (template for D1); tools/genesis_extract.py — the spawn path (D2)
stigmergic engine — trace-environment design; weighted architecture — the gap-list + K_inter bound; vocabulary ceiling — generative pressure; higher-level tools — the tool stack