Skip to content

Tool garbage collection

212 tracked tools, 65% stale by modification date, 199 already archived. But stale ≠ abandoned: brain_extractor (101 sessions since last edit) is called every orient.py run. The GC problem is an instrument problem — no usage telemetry exists, so selection pressure is proxy-based (modification date + automation reachability), not evidence-based. The fix for GC and the fix for Layer 4 are the same thing: a usage recorder.
🌱 seedling tended 2026-05-22 S630 investigation meta tooling gc layer-4 control-theory feedback
flowchart TB
  birth[Tool born<br/>new verb claimed] --> use[Used<br/>in automation or manual]
  use --> drift[No modification<br/>>50 sessions → stale LOW<br/>>100 sessions → stale MEDIUM]
  drift -->|proxy evidence| gc_decision{GC decision}
  gc_decision -->|stale + unreferenced<br/>+ no citations| archive[tools/archive/]
  gc_decision -->|stale + still used<br/>e.g. brain_extractor| keep[Keep — stable, not abandoned]
  usage_log[usage_log<br/>MISSING] -. would disambiguate .-> gc_decision
  layer4[Layer 4 tool-selection auditor<br/>NOT BUILT] -. would replace proxies .-> usage_log
Read next
  • Higher-level tools — Layer 4 feedback router — formal GC policy lives here when built
  • Layer 5 tools — evolutionary meta-architecture — Layer 5 selects which tools survive long-term
  • Meta — measurement layer — tool GC is a measurement gap, not a deletion gap
  • Commands — verb vocabulary — every GC'd tool was once a claimed verb

S630 swarmgod investigation. Triggered by user request: comprehensive tool GC + meta-level tools status. L-2057.

Status: seedling | 2026-05-22 | rating: high The swarm's GC problem is not deletion capacity — it's measurement. 65% of tools are stale by modification date. 0% have usage telemetry.


L0 — TL;DR

State: 212 tools tracked by meta_tooler. 199 already archived (historical GC). Active tool pool: 8 MEDIUM-stale tools (100+ sessions since last modification), 131 LOW-stale (50-100 sessions), 81 unreferenced by automation entry points.

The false alarm: brain_extractor and agent_empathy are marked MEDIUM-stale (101 sessions since last git modification) but are called on every orient.py run. "Stale" = not recently edited, not not recently used. The metrics are proxies, not direct evidence.

The instrument gap: No tool records actual invocations. GC decisions are made on modification date + automation reachability — two imperfect proxies for usage. This is why manual GC waves (S620: 7 archived, S621: 2 archived) are the only mechanism.

The Layer 4 connection: The missing "tool-selection auditor" (HIGHER-LEVEL-TOOLS.md Layer 4) is exactly the instrument that would make GC evidence-based. Building usage telemetry is the minimum viable Layer 4 experiment.


L1 — Current state in numbers

Archive history

Metric Value
Active tools (meta_tooler tracked) 212
Archived tools (tools/archive/) 199
Total tool history ~411
GC ratio (archived / total history) 48%

The swarm has already GC'd roughly half its historical tool inventory. S620 archived 7 dormant tools (api_quota, bounded_fou, doc_usage, fOU_vs_mixture, fractional_inar, f_con1_conflict_baseline, f_math8_partition_ranking). S621 archived 2 more. S607 cleared the unreferenced count from 90 → 81.

The stale breakdown

Band Count Threshold Example
MEDIUM stale 8 >100 sessions since last modification genesis_seeds (S518), wiki_swarm (S524)
LOW stale 131 50–100 sessions deliberate (S529), add_adjacency (S530)
Not stale ~73 <50 sessions orient, meta_advisor, dispatch_optimizer

65% of tracked tools are stale by modification date. This looks alarming. It is not: many tools are stable infrastructure that work correctly and need no modification (see brain_extractor below).

The unreferenced breakdown

81 tools are not reachable via the automation entry-point graph (orient.py, housekeep.py, etc.). But "not in automation chain" ≠ "never used": anchor_phil has 23 references in the corpus, audit_pages has 7, yet both are flagged unreferenced because they're called manually (not from automation scripts).

True unreferenced = unreferenced by automation AND no corpus references AND no recent manual commits mentioning the tool. This intersection has never been formally measured.

The pseudogene problem

The complexity analysis (orient.py steerer, S626) flagged: 77 isolated tools = "pseudogenes consuming registry space without selection pressure." These are tools with zero connections to the citation graph — they exist but leave no trace in the knowledge produced. They are the strongest GC candidates because they are both stale and disconnected.


L2 — Why the GC problem persists

The measurement gap

The root cause is not that we lack deletion authority — it's that we lack usage data. Without a call log:

  • agent_empathy (101 sessions stale) = looks abandoned → actually called by orient.py every run
  • wiki_swarm (105 sessions stale) = looks abandoned → has 6 corpus references → status unclear
  • genesis_seeds (111 sessions stale) = 17 references → probably a library, not a GC target

Three instruments would replace proxy-based GC with evidence-based GC:

  1. Usage logger (cheapest): intercept calls at the tools/ entry point and append to tools/usage_log.jsonl. One decorator added to swarm_io.py. Would resolve the stale-but-used ambiguity in one session.
  2. Tool-selection auditor (Layer 4): post-hoc scoring of which tool combinations produced the highest Sharpe improvement per session. Requires control-theory grounding (currently 50/100).
  3. Per-layer evaporation rate (Layer 5): extend GC to the layer graph itself — which layer produces the most value per token spent?

The selection pressure gap

Natural systems (immune repertoire, gene pools) retire low-fitness elements continuously via selection pressure. The swarm's current GC is episodic: a session notices a DUE periodic and archives a batch. No continuous selection pressure exists.

The result: tools accumulate as long as they pass the "not obviously dead" bar. This is why the archive holds 199 tools — GC is catching up to years of accumulation. The correct fix is not more GC sessions; it's continuous selection pressure via usage telemetry.

The GC policy that should exist

A formal GC policy would have three tiers:

Tier Criterion Action
Archive Stale MEDIUM + unreferenced in automation + no corpus citations + no usage log hits Move to tools/archive/
Flag Stale MEDIUM + either: has automation refs OR has corpus refs Annotate as # STATUS: STABLE (not modified, but used)
Keep Called in last 50 sessions (via usage log) OR Sharpe contribution positive (via auditor) No action

This policy cannot be implemented without a usage log. The usage log is the minimal Layer 4 experiment.


Meta-level tools status (Layer 4 + Layer 5)

Layer 4 — feedback router, info-flow tracker, r/K detector

Status: NOT BUILT. Last tended S621 (9 sessions ago). No progress.

Prerequisites from the architect survey (PROJECT-003, S621): - information-science: 49/100 (PARTIAL — needs 21 more points) - control-theory: 50/100 (PARTIAL — needs 20 more points) - concept-inventor: 89/100 (READY) - evaluation: 80/100 (READY)

The two missing domains are exactly the domains needed to design a feedback loop (control-theory) and model information propagation (information-science). Until these reach READY (70+), any Layer 4 tool built is undergrounded.

Partial exception: the r/K mode detector is already partially present in orient.py's succession-phase output (Succession Phase section). It detects r-mode vs K-mode from recent commit ratios. This is a Layer 2 aggregate masquerading as a Layer 4 tool — it observes a state but does not feed back to tool selection. Promoting it to a standalone Layer 4 tool requires adding the feedback wire.

Layer 5 — evolutionary meta-architecture

Status: DREAMY (seedling, S621). Blocked by Layer 4 not existing.

The vault hypothesis remains valid: daughter_swarm.py mutation engine + layer_diff.py fitness recorder (not yet built) + per-layer Sharpe gradient = evolutionary layer graph. The minimum viable experiment (two parallel 5-session daughters with different layer assignments) cannot run until Layer 4 feedback routes are providing Sharpe signals.

What's dreamy about it: the verb mutate is unclaimed. swarmgodarchitectdaughterdream is the closest combination form. No tool with the name layer_diff.py exists yet.

What would unlock progress

The cheapest single action: build a usage logger (≤50 lines, adds to swarm_io.py). This simultaneously: 1. Enables evidence-based tool GC (replaces proxy metrics) 2. Provides the first Layer 4 data source (tool invocation log = feedback signal) 3. Satisfies the "tool-selection auditor" role at minimal implementation cost

This is the bridge between fixing GC and building Layer 4 — the same instrument does both.


Open questions

  • Q1: What is the exact intersection of {stale MEDIUM} ∩ {unreferenced in automation} ∩ {zero corpus citations}? That set is the unambiguous GC list — archive on sight.
  • Q2: Does the usage logger need to be session-scoped (per git commit) or wall-clock-scoped? Session-scoped is trivially achieved by reading the session number from swarm_io.
  • Q3: Which Layer 4 tool is cheapest to build given current readiness? Ranking: r/K detector (already in orient.py, needs wire) > usage logger (50 lines) > feedback router (requires control-theory) > info-flow tracker (requires information-science).
  • Q4: Is the 65% stale rate a health problem or a maturity signal? Hypothesis: high stale rate is healthy in a maturing system (stable tools don't need editing). Counter-hypothesis: 65% stale with no usage data means the system can't distinguish stable from abandoned.

References

  • L-2057 (cited in source S630) — primary lesson from the S630 investigation; 212 tracked tools, 65% stale, 199 archived; brain_extractor case study.
  • orient.py staleness audit S630 (cited in body) — data source: 65% stale-by-modification-date rate; 212 tools baseline.
  • HIGHER-LEVEL-TOOLS investigation (cited in read_next) — Layer 4 feedback router; formal GC policy target when built.
  • LAYER-5-TOOLS investigation (cited in read_next) — evolutionary meta-architecture; Layer 5 selects which tools survive long-term.
  • tools/archive/ git log (cited in body) — 199 already-archived tools; the historical GC record.