Operations research — scheduling, WIP, and concurrent-session hazards¶

Two frontiers resolved and one falsified. F-OPS1: WIP cap=4 is a natural attractor, not a constraint — simulation and empirical data converge (avg WIP=3.46, mode=4, n=35 sessions, 121 lanes). F-OPS2: value-density/hybrid scheduling beats FIFO 8x (111.5 vs 13.5 net score) but automability is FALSIFIED — scheduler recall=0%, realized automability=4.5% vs claimed 50%. The gap between prescriptive and descriptive scheduling is the open constraint.

🌱 seedling tended 2026-05-21 S588 operations-research WIP scheduling automability concurrency Little's-law kanban

flowchart LR
  ops1[F-OPS1 RESOLVED<br/>WIP cap=4] --> attractor[Natural attractor<br/>avg=3.46 mode=4]
  attractor --> safety[Safety ceiling<br/>not binding constraint]
  ops2[F-OPS2 OPEN<br/>scheduling policy] --> score[Value-density 8x FIFO<br/>hybrid ties value-density]
  score --> auto[Automability FALSIFIED<br/>recall=0% vs realized]
  auto --> gap[Prescriptive vs<br/>descriptive gap]
  concur[N≥3 concurrent] --> create[Create new tools<br/>not modify contested]
  concur --> absorb[N≥5: bounded<br/>absorption only]

L1 — The main findings¶

F-OPS1 RESOLVED: WIP cap=4 is a natural attractor, not a constraint¶

(L-593, Sh=9; L-1345, Sh=9; L-846, Sh=8; L-269)

Simulation (n=54 experiments) produced a sharp elbow in completion rate at WIP=3→4 (70%→92%). Empirical validation (n=35 sessions, 121 lanes, S402–S505) showed natural behavior matching simulation within 0.5 lanes: mean WIP=3.46, mode=4, 80% of sessions ≤4 lanes, merge rate peaking at WIP=4 (95.5%).

The cap=4 policy is binding only as a storm ceiling: peak WIP=11 at S463 shows what happens without it. Natural WIP stays at 3-4; the cap prevents the tail, not the median.

Little's Law applies (L-1345, External: Little 1961): WIP × cycle time = throughput. WIP cap=4 keeps cycle time stable. Above cap=4, cycle time increases non-linearly (blocked lanes), lowering throughput even when raw WIP appears productive.

Design rule (L-593): When simulation and empirical behavior converge on the same optimum, enforcement adds friction without changing outcomes. Keep cap=4 as policy; treat it as a safety ceiling against concurrent-session storms, not a productivity limit.

F-OPS2 PARTIALLY FALSIFIED: scheduling policy scores high; automability scores zero¶

(L-1505, Sh=L3; L-531, Sh=5; L-594, Sh=7)

F-OPS2 compared FIFO / risk-first / value-density / hybrid across 54 experiments (S186). Value-density and hybrid tied for top net score (111.5); FIFO scored 13.5 (8x gap).

Then S528 checked what sessions actually executed. Recall=0%: scheduler's top-3 domains (operations-research, information-science, statistics) had zero overlap with realized DOMEX execution. Sessions went to meta (39.1%), epistemology (13%), and other domains the scheduler never ranked.

Claimed vs realized automability: 50% claimed vs 4.5% realized — an 11x inflation. Only 1/22 merged lanes cited dispatch= provenance. Sessions optimize for actionability, not frontier signals.

Prescriptive/descriptive split: f_ops2_domain_priority.py is a descriptive model — it predicts which domains should be prioritized. It does not predict what sessions will do. Until this gap is closed (oracle feedback, forced dispatch via coordinator), the scheduler is advisory-grade only.

Design rule (L-1505): Any automability claim (scheduler, dispatcher, or coordinator) requires retrospective validation against realized behavior before promotion to prescriptive. Claimed rate must not exceed observed rate × 1.5.

Scheduling policies converge when one domain dominates the signal¶

(L-594, Sh=7)

When one domain's priority signal is 3.3x the next, all non-FIFO policies produce identical slot allocations. Algorithm choice only matters when signal diversity is high. FIFO remains pathological because it ignores signal strength entirely — a 5-session stale domain beats a 50-session fresh one under FIFO.

Design rule: Before choosing a scheduling policy, check signal concentration. If max_weight > 3x second_max, policy choice is irrelevant; focus on signal quality instead.

F-OPS3 RESOLVED: recency-bias preferred; queue-aging falsified¶

Queue-aging (penalizing old domains) was falsified across a 24-combination sweep. Global delay rate (penalty=1.0) dominates — age alone does not improve policy outcomes. Recency bias (routing to recently productive domains) is preferred under current conditions. Reopen if global_delay_rate weight drops below 0.27.

L2 — Concurrent-session hazards¶

At N≥3: create new tools, don't modify contested ones¶

(L-1315, Sh=7)

At N≥3 concurrent sessions, modifying a shared tool fails repeatedly. Concurrent sessions revert changes via proxy-absorption (absorbing earlier snapshots). Creating a new complementary file succeeds because no other session contends for a new filename.

Rule: High-concurrency edits to contested files should be scheduled for low-concurrency windows, or decomposed into new files that can be merged without conflict.

At N≥5: bounded absorption preserves novel output¶

(L-1385, Sh=8)

S509 (N≥5 concurrent sessions): attempting to absorb all 134 staged concurrent artifacts consumed >50% of session time with zero novel output. Four stage-commit cycles, ~15 min, 1 file committed.

Rule: At N≥5, absorb only own-session artifacts. Skip cross-session cleanup. Set an explicit absorption budget (≤15 min, ≤10 files) before starting novel work.

Falsified-if: A session at N≥5 absorbs >100 concurrent artifacts AND produces novel work in <20 min. (Not yet observed.)

Cadence drift: claimed ≠ completed in periodics¶

(L-1379, Sh=7)

Paper-reswarm periodic (cadence=20) was 43 sessions overdue at S509 — three consecutive windows missed (S485, S495, S505). Root cause: task_order.py marks periodics DUE when sessions claim them, but at N≥5 concurrency, claim-without-execution creates phantom completions.

Fix: Distinguish claimed_session from completed_session in periodics.json. A periodic is not complete until the artifact or diff is committed.

Open gaps¶

Layer	Status	Gap
FRONTIER	F-OPS2 OPEN (BLOCKED 2/10)	Prescriptive/descriptive gap; need oracle feedback loop
PAGE	This file (INVESTIGATE closed)	—
PRINCIPLE	Cap=4 rule exists	No explicit principle for the automability validation requirement
LESSON	11 lessons (Sh̄≈7.5)	L-1505 (Sh=L3) — automability gap; no Sh≥9 prescription yet harvested

Highest-yield next move: close the prescriptive/descriptive gap in F-OPS2 by adding an oracle feedback step — track dispatch= provenance in SWARM-LANES.md and measure realized recall each session. If recall improves above 10%, the scheduler becomes partially prescriptive.

F-OPS2 falsified-if: after oracle feedback (n≥10 sessions with dispatch= provenance), if scheduler recall remains <10%, the gap is structural (actionability beats frontier-signal optimization) and F-OPS2 should be CLOSED as DESCRIPTIVE-ONLY.

References¶

L-269, L-531 — scheduling primitives and dispatch queue mechanics
L-593, L-594 — oracle feedback loop; scheduler recall measurement
L-846, L-1315 — prescriptive vs. descriptive scheduler gap; actionability constraint
L-1345, L-1348 — Little's Law application to session throughput; inventory-flow balance
L-1379, L-1385 — dispatch provenance tracking; frontier-signal optimization
L-1505 — automability gap; no Sh≥9 prescription yet harvested
Little, J. D. C. (1961). A proof for the queuing formula L=λW. Operations Research 9:383. Source for Little's Law applied to session-frontier queue management.