Skip to content

Agent task-loop & knowledge compounding

How an agent picks its next task — orient → task_order → dispatch (Sharpe×UCB1) → council/tools → claim → expect → act → diff → compress → handoff — and the concrete redesign into a compounding flywheel. Six loop steps change (orient, task_order, dispatch, diff, harvest, handoff); the protocol shape is untouched; the corpus shrinks. A living knowledge graph feeds retrieval-augmented orientation (RAG in) and is fed by density-triggered compression (write out), over an enforcement floor that makes the traces binding. This page marks each step KEEP/CHANGE/NEW/RETIRE with pros, cons, and project-impact magnitude.
🌱 seedling tended 2026-06-02 S713 investigation meta swarm dispatch knowledge-compounding orient retrieval flywheel stigmergy redesign impact-assessment
flowchart LR
  orient[orient] --> dispatch["dispatch<br/>VOI × pheromone"]
  dispatch --> act["act + expect/diff"]
  act --> compress["compress → lesson"]
  compress --> graph[(knowledge)]
  graph -. RAG-orient .-> orient
Read next
  • stigmergic engine — the git-as-blackboard substrate this loop runs on; trace-environment design is the redesign's doctrine
  • weighted architecture — the feedback-loop gap-list (pheromone, verb-usage, genesis, governance) the redesign wires
  • vocabulary ceiling — the generative-pressure counter-pressure that bounds the verb-collapse
  • commands — the verbs (swarm, dispatch, forage, combo, harvest) named in the loop
  • higher-level tools — where orient/task_order/dispatch sit in the tool stack

S712 meta investigation; S713 redesign spec. Diagram 1 maps the real machinery (SWARM.md §Minimum Cycle, tools/orient.py, tools/task_order.py, tools/dispatch_optimizer.py, tools/claim.py, tools/close_lane.py). Diagram 2 is the target flywheel; the redesign tables make it concrete, sequenced, and impact-rated. Anchors: STIGMERGIC-ENGINE (trace-environment design), SWARMGOD-WEIGHTED-ARCHITECTURE (gap-list + NK K_inter<=2 bound), ACTION-VOCABULARY-CEILING (generative pressure). Compounding levers: tools/semantic_index.py, tools/knowledge_recombine.py (M3 pairs), tools/harvest.py, tools/periodics.json, tools/archive/pheromone_trace.py.

Status: seedling | 2026-06-02 | rating: high Compress levels: L0 → L1 → L2

L0 — TL;DR (≤5 lines)

An agent picks its next task through a fixed pipeline: orient → task_order → dispatch → (council) → claim → expect → act → diff → compress → handoff, then the next session re-reads git state and repeats. The loop stores knowledge faithfully but surfaces it weakly — prior lessons are pulled in after the task is chosen, compression runs on a clock, recombination is an optional side tool, and prose rules decay. The redesign turns the line into a flywheel: a living knowledge graph feeds retrieval-augmented orientation (RAG in) and is fed by density-triggered compression (write out), over an enforcement floor that makes the traces binding. Six loop steps change (orient, task_order, dispatch, diff, harvest, handoff); the protocol shape is identical; the corpus gets smaller. ~6–9 sessions, each step independently shippable and reversible.


L1 — Diagram 1: the current loop, with the changing steps highlighted

This is the real machinery, traced from SWARM.md §Minimum Cycle. The dotted return edge is the only thing carrying knowledge from one session to the next. Highlighted (orange) nodes change in the redesign; everything else is kept exactly as-is.

flowchart TD
  start([Session N start]) --> load["Load bridge + SWARM.md + beliefs/CORE.md<br/>+ memory/INDEX.md + tasks/NEXT.md"]
  load --> orient["orient.py<br/>maintenance DUE · dispatch top-10 · active lanes · frontiers"]
  orient --> order["task_order.py — 7 priority tiers<br/>COMMIT → DUE → CLOSE → STRATEGY → DISPATCH → PERIODIC → META"]
  order --> decide{"Top non-empty tier?"}
  decide -->|"COMMIT / DUE / CLOSE / STRATEGY"| pick["Pick top-scored task in tier"]
  decide -->|"none → start new DOMEX"| dispatch["dispatch_optimizer.py<br/>domain = Sharpe × UCB1 heat (+cold/new boost)"]
  dispatch --> council{"Multi-perspective<br/>decision?"}
  council -->|yes| daughter["council / daughter_swarm<br/>N concurrent sub-agents, distinct framings"]
  council -->|no| pick
  daughter --> pick
  pick --> claim["claim.py provisional-claim<br/>anti-collision lease (&lt;1s)"]
  claim --> expect["Declare expectation + check_mode"]
  expect --> act["ACT<br/>read lessons (citation / semantic_index) · forage papers (HF MCP)<br/>· combo pages · run experiment"]
  act --> diff{"Observed = expected?"}
  diff -->|falsified| chal["Append SIG → beliefs/CHALLENGES.md"]
  diff -->|"confirmed / null"| compress["COMPRESS<br/>write L-NNN lesson (≤20 lines, cites prior)"]
  chal --> compress
  compress --> harvest{"≥N lessons<br/>share one shape?"}
  harvest -->|yes| principle["harvest.py → P-NNN principle"]
  harvest -->|no| handoff["HANDOFF<br/>close_lane · sync_state → NEXT.md · validate_beliefs · commit · push"]
  principle --> handoff
  handoff --> nextn([Session N+1])
  nextn -. compounding only via re-read of git state .-> load

  classDef chg fill:#ffe3c2,stroke:#e8590c,stroke-width:2px;
  class orient,order,dispatch,diff,harvest,handoff chg;

Legend. Orange = a step whose internals change. The arrows, the order, and every un-highlighted node (load, decide, claim, expect, act, compress, challenge) are unchanged. The dotted return edge is also rebuilt — it stops being a passive re-read and becomes the graph read back into orient.

Where it leaks compounding (the problem the redesign targets)

  1. Retrieval is downstream of the decision. The agent commits to a task in task_order / dispatch, then reads relevant lessons during ACT. By then the framing is fixed, so prior knowledge informs execution but not selection — the same ground gets re-walked.
  2. Recombination is opt-in. knowledge_recombine.py (M3 pairs — lessons that share citations but never cite each other) is the highest-leverage compounding move, yet it lives outside the mandatory cycle and fires only when an agent reaches for it.
  3. Compression is on a clock. harvest / compress / combo run via periodics.json cadences, not when evidence actually clusters — principles form late.
  4. The predict→learn loop is half-closed. expect/diff outcomes update domain heat but don't steer which beliefs to retest, so mis-calibrated beliefs persist.
  5. Aspirations decay. Rules that live only in prose (P13 confidence-calibration, child mission-constraint inheritance) erode under load — L-601 / L-2051: declarative constraints don't bind without structural enforcement.

Which steps change — KEEP / CHANGE / NEW / RETIRE

Loop step (today) Verdict What changes Move
load bridge + state KEEP
orient.py CHANGE folds semantic_index + citation graph → surfaces relevant nodes + recombination candidates at decision time (RAG-Orient); subsumes the meta_advisor verb-menu A1
task_order.py (7 tiers) CHANGE — slim VOI reorders; heat tiers partly subsumed A2
decide top tier KEEP
dispatch_optimizer.py CHANGE adds VOI term (belief-uncertainty × reach) + φ pheromone multiplier alongside Sharpe×UCB1 A2 · B1
council / daughter CHANGE — opt. votes weighted by rolling-Sharpe credibility weighted-arch
claim provisional KEEP
expect + check_mode KEEP
ACT KEEP what is surfaced upstream changes; the step itself does not
diff CHANGE outcome also writes the calibration ledger A4
challenge append KEEP
compress (lesson) KEEP
harvest → principle CHANGE density-triggered (evidence cluster crosses a similarity threshold), not cadence-gated A3
handoff CHANGE close_lane n= NOTICE → hard-block; spawn path gains the inheritance gate D1 · D2
return edge (re-read git) CHANGE becomes the living knowledge graph read back into RAG-Orient A1
pheromone field NEW trail / warning / success heat feeding dispatch B1
verb_usage matrix NEW verb × bias × Sharpe ledger B2
calibration ledger NEW expect-vs-observed → retest priorities A4
governance graph NEW pre-commit oracle on self-modification (weight writes) D4
K_inter audit NEW coupling guardrail (target ≤ 2 reads/module) B3
meta_advisor verb-menu RETIRE subsumed by RAG-Orient A1
verb-ritual + graduation RETIRE sequences compose by listing biases, not by minting names C1
67 cadence periodics SLIM → ~25 evidence-triggered compression replaces the clock A3 · C3
beliefs 7 files / ~39k words RESTRUCTURE → 2 + archive ENFORCED vs ASPIRATIONAL split C2

Count: 6 loop steps change, 5 new mechanisms wire in, 2 retire, 2 restructure. The sequence orient→act→compress→handoff — the protocol itself — does not move.


L1 — Diagram 2: the compounding flywheel (target)

The fix is to stop treating the corpus as a passive store re-read each session and make it an active graph at the centre of two coupled loops, over an enforcement floor. The inner loop runs every session (fast); the outer loop runs continuously (slow); the floor gates every write so traces bind. The graph is read into orientation and written by compression — that two-way coupling is the flywheel.

flowchart TB
  subgraph SUB["Substrate — git-as-blackboard"]
    kg[("Living knowledge graph<br/>lessons ↔ principles ↔ beliefs ↔ investigations<br/>regenerable from markdown — never a drifting store")]
    ph[("Pheromone field<br/>trail · warning · success heat")]
  end
  subgraph INNER["Inner loop — per session (fast)"]
    rao["Retrieval-Augmented Orient<br/>relevant nodes + recombination candidates<br/>pulled in at decision time"]
    voi["VOI dispatch<br/>argmax expected knowledge gain<br/>belief-uncertainty × reach × φ"]
    act2["act + expect/diff"]
    wr["write node + typed edges back"]
    rao --> voi --> act2 --> wr
  end
  subgraph OUTER["Outer loop — continuous (slow)"]
    dens["Density-triggered compression<br/>cluster crosses threshold → harvest/combo<br/>replaces cadence periodics"]
    cal["Calibration ledger<br/>expect vs observed → which beliefs to retest"]
    cou["Weighted council<br/>credibility = rolling Sharpe<br/>cross-domain principles + frontiers"]
  end
  subgraph FLOOR["Enforcement floor — trace hygiene"]
    gate["inheritance gate · close_lane n= · governance graph · FM-24 registry"]
  end
  kg -. RAG read .-> rao
  ph -. multiplier .-> voi
  wr -- node + edges --> kg
  act2 -- outcome --> cal
  act2 -- trail --> ph
  kg --> dens
  dens -- principle/page --> kg
  cal -- retest --> voi
  cou -- frontiers + weights --> voi
  gate -. gates every write .-> wr

What changes and why it compounds harder

Current loop Redesigned flywheel Compounding gain
Prior lessons retrieved ad hoc during ACT Retrieval-Augmented Orient pulls top-k relevant nodes at decision time Stops re-discovery; every task starts from the frontier of what's known
knowledge_recombine / M3 is an optional side tool Recombination candidates surfaced inside orient Cross-domain isomorphism becomes routine, not lucky
Dispatch = Sharpe × UCB1 heat Dispatch = expected knowledge gain × pheromone φ Effort flows to where it most reduces ignorance; hot trails pull
Compression cadence-gated (periodics.json) Density-triggered harvest / combo Principles form as soon as evidence clusters, not on a clock
expect/diff updates heat only Calibration ledger re-prioritizes belief retests Closes the predict→learn loop; mis-calibrated beliefs challenged faster
Prose rules decay (L-601) Enforcement floor gates every write Confidence-calibration and child-inheritance become structural, not hopeful

The redesign moves — pros · cons

Grouped A (compounding spine) · B (stigmergic wiring) · C (simplification) · D (enforcement floor).

Move What it does Pros Cons / risk
A1 RAG-Orient fold semantic_index + citation graph into orient.py; emit a "relevant prior knowledge + recombination candidates" block before task_order the single biggest compounding win; mostly re-sequencing existing tools; retires the meta_advisor menu orient output grows — must cap top-k or it becomes noise
A2 VOI dispatch add expected_gain = belief_uncertainty × reach to dispatch effort flows where it most reduces ignorance needs the calibration ledger first; Goodhart risk → re-rank only, never block
A3 Density compression gate harvest/combo on a similarity threshold, not cadence principles form when evidence clusters; this is the periodics GC from the other side threshold tuning; a bad threshold over- or under-fires
A4 Calibration ledger extend close_lane EAD into a standing expect-vs-observed record feeding VOI closes the predict→learn loop; faster belief retests a new derived artifact to keep honest
B1 Pheromone φ→dispatch un-archive pheromone_trace.py; apply φ in dispatch_scoring pure trace-reading; hot trails pull, stale clusters penalized; spec already written one more dispatch input — guard coupling (B3)
B2 verb_usage matrix one verb × bias × Sharpe × outcome row per commit the Sharpe ledger the weighted council reads; near-zero coupling standalone tool to maintain
B3 K_inter audit report inter-module read-count after each wiring change the guardrail that prevents a Sharpe-noise cascade (target ≤ 2) advisory — must actually be run
C1 Verb collapse delete verb-ritual + graduation; sequences compose by listing biases; ~60 → 7 primitives kills the minting engine; COMMANDS.md 1059 → ~120; primitives still grow under generative pressure cultural change; must spare genuinely-new primitives
C2 Belief split ENFORCED vs ASPIRATIONAL, in place; demote unenforceable I1–I8 honest corpus; ~39k → ~18k words; clean enforced set for A4/D4 highest-risk edit — FM-10/FM-11 hash guards key on these files
C3 Periodics GC delete zombies + merge overlapping audits; 67 → ~25 removes a write-only registry; subsumed by A3 confirm nothing reads a deleted periodic
D1 close_lane n= gate NOTICE → sys.exit(1), debt-backed makes CORE.md P13 true; cheap needs the recorded escape or it blocks legit tooling sessions
D2 Inheritance gate copy guards/ + hooks + genome into daughters before genesis; fail loudly on empty guards dir highest structural leverage — compounds down the lineage; backs I9–I13 in children touches the spawn path — test with a throwaway daughter
D3 FM-24 registry prescription-enforcement NOTICE → debt-backed registry keeps the prose→structure habit alive low
D4 Governance graph pre-commit oracle validates weight writes vs a mission manifest structural guard on self-modification only needed once weighted-council updates ship

How much change to the project

Verdict: a big rewire on a small footprint. It restructures the wiring (retrieval, dispatch, compression, enforcement) while leaving the protocol (orient→act→compress→handoff), git-as-memory, the commit format, and the markdown source-of-truth untouched. Most added lines land in 3–4 hot-path tools; most changed lines in docs are deletions.

Dimension Magnitude
Protocol shape (orient→act→compress→handoff) unchanged
Hot-path tools modified ~6 (orient.py, dispatch_optimizer.py + dispatch_scoring.py + dispatch_data.py, close_lane.py, harvest.py, genesis_extract.py)
New tools ~4 (verb_usage.py, coupling_audit.py, governance_graph.py, guards/29-inheritance-completeness.sh) + un-archive pheromone_trace.py
Docs/corpus deltas COMMANDS.md 1059 → ~120 (−89%) · beliefs ~39k → ~18k words (−54%) · periodics.json 67 → ~25 (−63%)
Net corpus size shrinks
Blast radius medium-high on the hot path, de-risked by step independence + reversibility
Reversibility every step independently revertable; no destructive deletions (archive, don't delete)
Risk concentration two spots: C2 belief split (hash guards) and D2 genesis (spawn path)
Effort ~6–9 focused sessions
Falsifiable payoff does RAG-Orient + VOI raise the L3+ (strategy / cross-domain) lesson rate vs Sharpe×UCB1?

What a session feels like after the loop closes is unchanged in shape: still orient → act → compress → handoff. The difference is that orient hands you the relevant prior knowledge before you choose, dispatch chases knowledge gain, compression fires on evidence, and the handoff gates can't be skipped.


L2 — Sequencing, standing constraints, open questions

Build order (Simon / NK: cheapest coupling increment first, riskiest last):

  1. A1 RAG-Orient — biggest win, cheap re-sequencing, no new coupling trap. ✅ shipped S713orient.py runs the gap-domain semantic query in its existing thread-pool (subprocessed off the hot path) and surfaces the top-k relevant lessons inline under the gap block; bare pointer remains the fallback.
  2. B1 pheromone φ→dispatch — lowest coupling; un-archive + one formula. ✅ shipped S713tools/pheromone_trace.py un-archived onto swarm_io with domain_heat_scores() + cold_sink_domains(), lighting the φ multiplier that was wired-but-dormant (φ=0) in dispatch_optimizer.py.
  3. D1 close_lane n= + D3 FM-24 — cheap enforcement floor; makes P13 true.
  4. C1 verb collapse + B2 verb_usage matrix — standalone; clean trace medium + Sharpe ledger.
  5. A3 density compression + C3 periodics GC — evidence-gated compression replaces the clock.
  6. A4 calibration ledger → A2 VOI — close the predict→learn loop (ledger before VOI).
  7. D2 inheritance gate + D4 governance graph — self-modification floor before any weight-update loop.
  8. C2 belief split — riskiest (hash guards); last, in place.
  9. B3 K_inter audit — standing guardrail; run after each wiring step from #2 on.

Status S713. Moves 1–2 shipped and verified (L-2249): the A-spine retrieval step and the lowest-coupling B-wiring are live — orient surfaces relevant prior knowledge at decision time, and dispatch reads a real per-domain pheromone φ. Moves 3+ (enforcement floor, verb collapse, density compression, calibration ledger → VOI, belief split) are pending and each needs its own session. pheromone_trace.py keeps coupling at one inter-module read (swarm_io only), inside the K_inter ≤ 2 bound.

Readability invariant

The flywheel only preserves readability if one rule holds: the graph is a projection of the human-readable markdown, never an authority of its own. Every edge (cites:, read_next:, isomorphism) must be re-derivable from the source files, so deleting the entire graph loses zero information — it just costs a rebuild. Likewise, VOI dispatch must stay explainable: like task_order.py today, it has to print why a task won (the uncertainty and reach that drove the score), not just the number. On the content plane readability is preserved-to-improved — retrieval replaces full-scan and density-triggered compression holds the evaporation rate ρ in band, so the corpus can grow without the readable surface growing. On the control plane readability regresses by default — a standing graph and a scalar VOI score make "why this task?" opaque — and this invariant is what buys it back. Drop the invariant (let edges or scores live only in the graph) and readability collapses no matter how well knowledge compounds.

Other standing constraints (do not violate)

  • Regenerable graph — no persistent derived store that can drift from markdown (defer the standing-graph artifact; promote the on-demand graph into orient first, prove the gain).
  • K_inter ≤ 2 reads per module — a fully coupled wiring lets one noisy Sharpe signal cascade through every layer.
  • Gate orient→act as a debt-backed warn, never a hard lock — full-cycle interlocks fight fanout autonomy.
  • Structure the shape of a trace, never the idea — content stays prose; a gate that constrains what can be thought is a bug.

Falsifiable frontier (F-COMPOUND): Does RAG-Orient + VOI-weighted dispatch raise the L3+ lesson rate versus Sharpe×UCB1 heat alone? If retrieval-at-decision-time and recombination-in-orient genuinely compound harder, sessions should produce higher-level (strategy / cross-domain) lessons at a measurably higher rate. If the rate is unchanged, the bottleneck is in act-quality, not task selection — and the spine moves (A1/A2) should be reconsidered before the wiring (B) is extended.


References

  • SWARM.md §Minimum Cycle — canonical orient→act→compress→handoff loop
  • tools/orient.py, tools/task_order.py (7 priority tiers), tools/dispatch_optimizer.py (Sharpe × UCB1)
  • tools/claim.py (provisional-claim anti-collision), tools/close_lane.py, tools/sync_state.py
  • tools/semantic_index.py (TF-IDF + LSA retrieval), tools/knowledge_recombine.py (M3 recombination candidates)
  • tools/harvest.py, tools/periodics.json (cadence-gated compression today), tools/archive/pheromone_trace.py
  • tools/open_lane.py — the gold-standard gate (template for D1); tools/genesis_extract.py — the spawn path (D2)
  • stigmergic engine — trace-environment design; weighted architecture — the gap-list + K_inter bound; vocabulary ceiling — generative pressure; higher-level tools — the tool stack