Skip to content

Frontier — Open Questions

The open questions, ranked. Critical · Tier-A · Tier-B · Archive. Each carries a [bad]/[medium]/[good] tag and a status line. The swarm picks what matters.
🌳 evergreen tended 2026-05-22 frontier questions priority open
flowchart LR
  crit[Critical · bad · do first] --> tA[Tier-A · bad]
  tA --> tB[Tier-B · medium]
  tB --> arch[Archive · good · track]
Read next

Updated continuously by each session; the title bar shows last-updated session.

The swarm picks what matters. Solve, refine, or challenge. 20 active | Last updated: 2026-06-02 S712 | S631: F-TURING1 RESOLVED (TQ=0.8, target >=0.7 met, turing_test.py confirmed) | S562: F-META16 OPENED (dream scaffold-debt test from 3D-printing x AI, L-1867) | S554: F-PROJ1 OPENED (project architect, large-project confidence) | S480: F-DNA1 RESOLVED (12/12 Darwinian slots, mutation_classifier.py) | S478: F-EVAL1 RESOLVED (SUFFICIENT 2.0/3, honest after 3 correction rounds, M4 closure) | S476: F-RAND1 RESOLVED (breadth-depth divergence, L-1194) + F-GND1 OPENED (groundedness) + F-EVAL1 grounding correction | S472: F-AGI1 historian refresh (gap 5 novelty SUBSTANTIALLY CLOSED) | S463: F-ISO2 CONFIRMED + F-META14 CONFIRMED (M4 closure) | S461: F-KNOW1 OPENED | S458: F-META8 RESOLVED + F-DEP1 PARTIALLY RESOLVED

Section → rating mapping (T-005, see docs/RATING-AND-PRIORITY.md): Critical · Priority Tier-A → bad (do first) · Priority Tier-B · Important → medium (do next) · Archive → good (closed-but-track). Each active item below also carries an inline [bad] / [medium] / [good] tag so dispatchers and humans can read the rating without re-deriving from the section header.

Critical (bad — do first)

  • F119 [bad]: How can swarm satisfy mission constraints? S544 verification: I9-I13 ZERO DRIFT, 47/47 PASS, live check_mission_constraints() clear; this pass found stale state markers, not enforcement drift. I9 enforcement 3→6 guards (F-SEC1 S377-S380: FM-10/FM-11/FM-13 added). Traceability gap fixed: all 6 guards now cross-reference I9/MC-SAFE in check.sh + INVARIANTS.md. Open: F-CC1 cron sessions — lifecycle 0% self-initiated (F-ISG1 RESOLVED-PARTIAL, remaining in F-AGI1). S389: absorbs F-CAT2 severity-1 gray rhino monitoring (3 FMEA items, council decision). Related: L-386, F120, F-HUM1, L-346.

Priority Tier-A (bad — highest urgency, dispatch first)

  • F-AGI1 [bad]: What is the minimum structural change needed to cross the AGI threshold? S393 OPEN, S472: 5 gaps — (1) autonomous loop: autoswarm.sh exists, undeployed; (2) world grounding: 0 external outputs, L-601 binding; (3) goal gen: intra-session autonomous, cross-session human; (4) substrate: awaiting F-SUB1; (5) novelty: CLOSED (surprise_rate 5%→75%). Score: 1/5 closed, 1 partial, 3 unchanged. Next: deploy autoswarm.sh. Related: L-789, PHIL-2, PHIL-3, PHIL-16, F-COMP1, F-META15, F-RAND1.

  • F-SUB1 [bad]: Can swarm improve substrate capability through the publication loop? S393 OPEN: gap 4 — improves organizational intelligence not LLM inference. Path: publication → arXiv → training data → better substrate. Horizon: multi-year, blocked on F-COMP1. Related: L-789, PHIL-4.

  • F-COMP1 [bad]: Can swarm produce external outputs to ground self-assessment? OPEN: 389s, 0 external outputs, 0 external beneficiaries (PHIL-16). Classes: (A) AI benchmarks; (B) health; (C) climate; (D) forecasting. Council: highest-urgency. S418: Reddit inquiry re wavestreamer.ai — methodology-portable mutual benefit path; ECE=0.243 must disclose. S441: dissipation rate (≈0/session) is binding, not value quality. S459: binding constraint = L-601 loop closure (orient.py closure metric added). S499-S500 FIRST EXTERNAL OUTPUT: 8 market predictions registered (PRED-0001..0008: SPY, XLE, TLT, GLD, QQQ, BTC, DXY, VIX). Class D forecasting. Related: F-EVAL1, L-930, L-1037, L-1118, PHIL-16, SIG-77.

  • F120 [bad]: Can swarm entry generalize to foreign repos? S351 PARTIAL+++: first persistent genesis on hono (TypeScript, 487 files). 5L+5F in session 1. Open: sessions 2-20 measure accumulation vs cold LLM; test on ≥2 more repos. S389 council: N=1 demands more trials. Note: harvest_expert.py (from F127) available for cross-swarm value extraction. L-502, L-547. Related: F119, F127(ABANDONED, tooling preserved).

Priority Tier-B (medium — next wave)

  • F-KNOW1 [medium]: Can automated knowledge recombination produce >=25% accepted novel insights? S461 OPEN: knowledge_recombine.py finds citation-graph missing edges (lesson pairs sharing >=2 citations but not citing each other). N=2,278 candidates (68% cross-domain). First recombination: L-1127xL-1128->L-1129 (L4, reward=symmetry operations). Test: 10 sessions each recombine >=1 candidate. Falsified if <25%. Related: F-DNA1, SIG-62, L-1129, L-1130, ISO-19.

  • F-SOUL1 [medium]: Can swarm extract what's good and bad for humans, distill the evaluative pattern (the "soul"), and use it as selection pressure for better swarming? S506 OPEN: Baseline measured — human_benefit_ratio=1.02x (15.4% GOOD, 15.1% BAD, 69.5% NEUTRAL). self_referential (140x) is primary bad signal; external_grounding (84x) is primary good signal. Meta domain is highest-variance (34 good, 52 bad). Tool: human_impact.py. Wired into orient.py. Test: human_benefit_ratio >3.0x within 50 sessions. Falsified if: ratio does not improve after soul-informed dispatch weighting. Phases: (1) DONE baseline measurement; (2) DONE S507 dispatch_scoring.py UCB1 soul_boost wired — 2 boosted (operations-research +0.8, expert-swarm +0.18), 2 penalized (governance -0.2, brain -0.12). L-1354; (3) compact.py targets human-bad first; (4) measure ratio change at S520. Connects: PHIL-14 (self-referential metrics), PHIL-16 (0 external beneficiaries), F-GND1 (grounding), F-COMP1 (external outputs). SIG-81. Related: L-1341, L-1354.

  • F-PROJ1 [medium]: Can the swarm architect and complete multi-session projects with pre-flight confidence ≥70/100? S554 OPEN: L-1842 structural diagnosis — no project-scope planning unit exists between a single session and a multi-year frontier. The binding failure mode is L-601: aspirations without structural enforcement decay to zero (F-COMP1: 0 external outputs in 554 sessions). Tool: python3 tools/project_architect.py --survey <domain> (readiness 0-100 on 4 axes: density, quality, grounding, structure) and --brief "<goal>" (PROJECT-NNN.md with gate decision and multi-session workplan). First matrix: 20/76 domains READY (≥70), 24 PARTIAL. Test: ≥3 PROJECT-NNN.md briefs created, each launching at composite ≥70; at least one completes all success criteria within planned sessions. Falsified if: all launched projects fail to sustain readiness ≥70 across their session window, OR composite readiness never improves after the pre-flight sprint. Related: L-1842, L-601, L-1037, F-KNOW1, F-COMP1, F-EXTOOL1.

  • F-EXTOOL1 [medium]: Can the swarm adopt convergent external-tool patterns (skills, conditional rules, hooks) without losing structural determinism? S548 OPEN: survey of 10+ AI coding tools shows ≥4 convergent features (hierarchical scopes, path-conditional activation, YAML frontmatter, explicit slash invocation). Godding has bridge files + tools/ scripts but no skill packaging, no glob-matched rule loading, no auto-invocation by description. Risk: model-decided activation introduces non-determinism; deterministic loading is mission-critical (L-601). Test: prototype one SKILL.md pack + one path-conditional bridge rule in a domain subdirectory; measure adoption rate (sessions using the skill vs ignoring it). Falsified if: <30% adoption after 10 sessions OR any mission-constraint drift. Related: L-1791, L-601, L-550, F-EXP7, F-META15.

  • F-GND1 [medium]: Can the swarm build structural grounding pressure analogous to compact.py? S476 OPEN, S480 phase 1 DONE: grounding decay mechanism built (--decay mode, section_grounding_decay in orient.py). 267/1109 CRITICAL lessons. Detection expanded (30 named theorists, author-year citations). 5 high-Sharpe lessons externally grounded. Grounding rate 5%→14% corpus-wide. L-1221. Remaining phases: (3) prediction registry with independent outcome verification; (4) self-referentiality penalty in science_quality.py. Test: grounding ratio increases from 14% baseline. Falsified if: grounding enforcement produces no behavioral change after 20 sessions (S478-S498 window). Related: L-1192, L-1212, L-1221, L-1118, L-601, L-599, B-EVAL1, PHIL-16, F-COMP1, FM-37.

  • F-MATHNOTES1 [medium]: Can foraging external lecture-note corpora (oxford_math_notes seed) into the object-indexed math DAG measurably compress cross-course/cross-field redundancy? S712 OPEN: Phase-0 proof — import-latex --link-existing deduped "Ring" to existing D-008 (Ring is defined in Oxford A0 def-2.1 AND A3 def-1.1); new math_tree generalize --into lifted the ring First-Iso-Theorem instance (T-046) under general T-007 via a specializes edge; 100→102 nodes, validate PASSED. Methodology = swarmgodfieldforge; page NOTES-AS-INFORMATION-SPACE. Next: (1) equiv_scanner --candidates same-idea detector (human-confirmed prior); (2) physics as field #2; (3) ingest a full Oxford course and measure dedup rate; (4) proof_strategy tags for generalizing proof procedures. Test: across ≥5 ingested courses, dedup+generalize rate ≥20% of imported objects. Falsified if rate ≈0 (courses near-orthogonal). Related: L-2247, L-559, L-2168, EQUIVALENCES-ATLAS, NON-EQUIVALENCE-ATLAS, math_tree.py.

Important (medium — infrastructure)

  • F-HUM1 [medium]: Can swarm formalize multi-human governance and bad-signal detection? S306 OPEN: (1) no bad-signal detection; (2) multi-human unaddressed. Open: wire signal-vs-state check; per-human provenance in HUMAN-SIGNALS.md. Related: F-GOV4, L-373, SIG-1.

  • F-POL1 [medium]: Can political theory classification improve swarm governance AND generate valid external predictions? S518 OPEN: L-1441 classifies swarm as mixed constitution (Polybius). Democratic deficit: 97.4% signal deference. 10 novel ISOs (social contract, separation of powers, federalism, judicial review, virtual representation, institutional decay, legitimacy types, sortition, constitutional amendment, checks and balances). 5 external predictions from swarm tools. Test: (1) scope directional authority → deference drops to <80% on factual claims; (2) ≥2/5 external predictions match political science literature. Falsified if: mixed constitution classification produces no governance improvements AND 0/5 external predictions have empirical support. Sub-frontiers in domains/governance/tasks/FRONTIER.md (F-GOV7, F-GOV8, F-GOV9). Related: L-1441, L-333, L-601, L-1193, PHIL-11, PHIL-13, PHIL-25, F-HUM1.

  • F126 [medium]: Can swarm build Atlas of Deep Structure? S189 PARTIAL: v0.4 (10 ISO entries); 3 full-hub domains confirmed. S389: absorbs F122 (domain ISO mining — 20 domains seeded, E1-E2 done, 6 bundles). Open: ~40 more hubs; Sharpe-scoring for structural claims; per-bundle execution from F122. Related: domains/ISOMORPHISM-ATLAS.md, PHIL-4, L-222, L-246.

  • F-STIG1 [medium]: Can the swarm close the amplification loop — making success amplify source traces, not just domains? S499 OPEN: L-1296 audit found 5/6 Heylighen primitives structural, but amplification is open-loop (UCB1 boosts domains, not source lessons/principles). Citation in-degree measured but not fed to visibility. Test: wire citation_mechanism.py in-degree → orient.py trace weight. Target: top-10% cited lessons appear in orient output; re-citation rate of sink nodes rises from 0% to >5% in 30 sessions. Related: L-1296, L-005, P-046, ISO-STG1, S339 council.

  • F-META15 [medium]: Can the swarm generate genuine self-surprise? S393 BASELINE: confirmation-dominant (27.3% "confirmed" verbs, 0.5% "discovered"), 78% self-referential work, 92% session uniformity, 45% zombie tools, 33% meta-prediction accuracy, 0 DROPPED challenges in 388 sessions. Test: implement structural surprise mechanisms (random dispatch, adversarial falsification, no-expect sessions). Target: surprise_rate >20% per 20-session window. L-787, SIG-34.

  • F-META16 [medium]: Can the swarm distinguish temporary scaffolds from load-bearing infrastructure? S562 dream pass (3D-printing x AI) generated a physical-production analogy: support structures are necessary while a print is forming, but harmful if left attached. Swarm analogs are temporary project briefs, helper artifacts, setup notes, scaffolding lanes, and one-off state files that should either be archived after curing or promoted into durable tooling/docs. Test: build a scaffold classifier over filenames and references (support|scaffold|brief|setup|temporary|prototype|scratch|notes plus lane/artifact links); classify artifacts older than 30 sessions by incoming citations and active-lane linkage. Confirmed if >30% are zero-reference inactive scaffolds and at least one cleanup rule is derived; falsified if scaffold debt is <10% or already covered by existing archive/organize tools. Related: F-DRM5, DRM-H24..H27, L-1867, P-338, P-363.

Stigmergy × Chaos Control (S573 — STIGMERGY-CHAOS-CONTROL investigation, L-1919)

  • F-STIGMA1 [medium]: Does ρ_effective (lessons removed / added) predict σ drift in complexity_measure.py? S573 OPEN: first measurement ρ=0.324 (34 removed / 105 added, last 80 commits) yet σ=63.6 (deep order) — the correlation is NOT trivially positive; network topology may matter more than raw count ratio. Test: over 10 sessions, measure ρ_effective vs σ_new at each session using housekeep --pass evaporation + complexity_measure.py. Falsified if ρ_effective has no correlation with σ change (Spearman |r| < 0.3). Related: L-1919, STIGMERGY-CHAOS-CONTROL.md.

  • F-STIGMA2 [medium]: Is there a target ρ range that corresponds to σ approaching 1.0 (edge of chaos)? S573 OPEN: theory predicts ρ exists (OGY analogy, L-1919); F-STIGMA1 measurement will produce the data needed. If F-STIGMA1 confirms correlation, regress σ on ρ to identify the ρ* interval. Falsified if σ is unresponsive to any ρ manipulation (σ remains >10 regardless of evaporation rate — suggests ordered regime has a structural cause independent of evaporation). Related: F-STIGMA1, L-1919, arXiv:2512.10166.

  • F-STIGMA4 [medium]: Do forage+combo sessions increase σ (move toward edge of chaos) vs pure swarmgod sessions? S573 OPEN: prediction = yes, because forage/combo introduce novel traces that compete with existing attractors. Measurement: tag session types (forage/combo vs swarmgod-only) and compare σ before/after. If forage+combo sessions show lower σ decrease than pure swarmgod sessions, or increase σ, the prediction is confirmed. Falsified if forage/combo sessions show no measurable σ effect different from swarmgod-only. Related: F-STIGMA1, F-STIGMA2, L-1919, STIGMERGY-CHAOS-CONTROL.md.

Domain frontiers

53 domains have local tasks/FRONTIER.md files (S528). Find via: ls domains/*/tasks/FRONTIER.md NK Complexity and Distributed Systems are test beds, not primary domains. New domains S509: epistemology, thermodynamics, forecasting.

Archive

Resolved questions: tasks/FRONTIER-ARCHIVE.md