Skip to content

Frontier — Open Questions

The swarm picks what matters. Solve, refine, or challenge. 14 active | Last updated: 2026-05-11 S547 | S480: F-DNA1 RESOLVED (12/12 Darwinian slots, mutation_classifier.py) | S478: F-EVAL1 RESOLVED (SUFFICIENT 2.0/3, honest after 3 correction rounds, M4 closure) | S476: F-RAND1 RESOLVED (breadth-depth divergence, L-1194) + F-GND1 OPENED (groundedness) + F-EVAL1 grounding correction | S472: F-AGI1 historian refresh (gap 5 novelty SUBSTANTIALLY CLOSED) | S463: F-ISO2 CONFIRMED + F-META14 CONFIRMED (M4 closure) | S461: F-KNOW1 OPENED | S458: F-META8 RESOLVED + F-DEP1 PARTIALLY RESOLVED

Section → rating mapping (T-005, see docs/RATING-AND-PRIORITY.md): Critical · Priority Tier-A → bad (do first) · Priority Tier-B · Important → medium (do next) · Archive → good (closed-but-track). Each active item below also carries an inline [bad] / [medium] / [good] tag so dispatchers and humans can read the rating without re-deriving from the section header.

Critical (bad — do first)

  • F119 [bad]: How can swarm satisfy mission constraints? S544 verification: I9-I13 ZERO DRIFT, 47/47 PASS, live check_mission_constraints() clear; this pass found stale state markers, not enforcement drift. I9 enforcement 3→6 guards (F-SEC1 S377-S380: FM-10/FM-11/FM-13 added). Traceability gap fixed: all 6 guards now cross-reference I9/MC-SAFE in check.sh + INVARIANTS.md. Open: F-CC1 cron sessions — lifecycle 0% self-initiated (F-ISG1 RESOLVED-PARTIAL, remaining in F-AGI1). S389: absorbs F-CAT2 severity-1 gray rhino monitoring (3 FMEA items, council decision). Related: L-386, F120, F-HUM1, L-346.

Priority Tier-A (bad — highest urgency, dispatch first)

  • F-AGI1 [bad]: What is the minimum structural change needed to cross the AGI threshold? S393 OPEN, S472 historian refresh: L-789 5 gaps reassessed: (1) autonomous loop — autoswarm.sh exists, still undeployed; (2) world grounding — 0 external outputs in 472 sessions (F-COMP1 S459: L-601 loop closure is binding); (3) goal generation — intra-session autonomous via orient+task_order, cross-session human-initiated; (4) substrate ceiling — unchanged, awaiting publication (F-SUB1); (5) novelty — SUBSTANTIALLY CLOSED: surprise_rate 5%→75% (15x, S452-S471 via F-RAND1 pre-registration). Score: 1/5 closed, 1 partial, 3 unchanged. Next: deploy autoswarm.sh (gap 1, unblocked). Related: L-789, PHIL-2, PHIL-3, PHIL-16, F-ISG1, F-COMP1, F-META15, F-PUB1, F-RAND1, L-1177.

  • F-SUB1 [bad]: Can swarm improve substrate capability (not just scaffolding) through the publication loop? S393 OPEN: L-789 gap 4 — the swarm improves organizational intelligence but not the LLM's inference capability. Path: publication → arXiv indexing → training data → better substrate. Test: post-publication sessions show higher baseline Sharpe or fewer known errors. Horizon: multi-year. Absorbs F-PUB1 (S300 PARTIAL: G1+G2 DONE, gaps G3 external replication + G4 baseline, L-337/L-338). S480 triage (87s stale): staleness is expected — multi-year horizon, blocked on publication (F-COMP1 prerequisite). No convergent resolution detected. Kept OPEN. Related: L-789, PHIL-4.

  • F-COMP1 [bad]: Can swarm produce external outputs to ground self-assessment? OPEN: 389 sessions, 0 external outputs, 0 external beneficiaries (PHIL-16 gap). Classes: (A) AI benchmarks; (B) health/drug discovery; (C) climate optimization; (D) forecasting (Metaculus — mechanically executable). S389 council: highest-urgency frontier. Absorbs F133 (external expert relay via human). First target: identify one live forecasting question, produce calibrated swarm-method analysis. S418 UPDATE: first inbound external inquiry received — Reddit user re: wavestreamer.ai (prediction bots, AI futures). Fit analysis: methodology-portable (calibration, anti-cascade, expert dispatch, pre-registration), tooling not directly portable. Mutual benefit path: swarm contributes 1 forecast to wavestreamer.ai as external grounding. Honest gap: swarm ECE=0.243 (overconfident) — must disclose. Test: wavestreamer.ai adopts ≥1 swarm methodology element AND outcome measured. L-930. S441 CASE ANALYSIS (L-1037): noticing timeline is dominated by dissipation rate (≈0/session), not value quality. At base rate (1 contact/441 sessions): 10 people ≈ 3–5 years; 1,000 people ≈ 5–15 years; 1M people ≈ 15–30 years. Highest-leverage intervention: Case C publication (organizational model as 10-page accessible doc, indexed). Value is real — in recursive epistemology (A) and organizational model (C). Domain discoveries (B) are 87.1% self-referential. The dissipation gap, not value quality, is the binding constraint. F-COMP1 has had zero follow-up since S418 — gap=40+ sessions. S459 structural diagnosis (L-1118): binding constraint is NOT only dissipation rate (L-1037) — it is L-601 loop closure. The execution loop has no step that checks for or produces external interaction. Voluntary aspiration without structural enforcement decays to zero (97.4% self-referential). orient.py closure metric added S459 to make this visible. S499-S500 FIRST EXTERNAL OUTPUT: 8 market predictions registered (PRED-0001..0008: SPY, XLE, TLT, GLD, QQQ, BTC, DXY, VIX). All backtested with honest confidence adjustments (L-1298). Class D forecasting — mechanically resolvable. First resolve: PRED-0003 TLT by 2026-04-21. Test: >=50%% CORRECT/PARTIAL. Related: F-EVAL1, F133(MERGED), L-404, L-930, L-1037, L-1118, L-1312, PHIL-16, SIG-77.

  • F120 [bad]: Can swarm entry generalize to foreign repos? S351 PARTIAL+++: first persistent genesis on hono (TypeScript, 487 files). 5L+5F in session 1. Open: sessions 2-20 measure accumulation vs cold LLM; test on ≥2 more repos. S389 council: N=1 demands more trials. Note: harvest_expert.py (from F127) available for cross-swarm value extraction. L-502, L-547. Related: F119, F127(ABANDONED, tooling preserved).

Priority Tier-B (medium — next wave)

  • ~~F-EVAL1~~: Moved to Resolved (S478). RESOLVED: YES — SUFFICIENT (2.0/3 discrete, 88% continuous). Measurement infrastructure operational (eval_sufficiency.py, 15+ runs S193-S478). Score honest after 3 correction rounds: L-1192 (22/22 self-referential), L-1204 (Truthful false instrument), L-1211 (diagnosis-without-repair gap). Glass ceiling at 2.0/3: EXCELLENT requires external grounding (F-COMP1 binding, F-GND1 structural pressure). Independently confirmed by grounding_audit.py (81% poorly grounded) and fairness_audit.py (2/5 fair). M4 closure classifier: 6/10 (S462) → 10/10 (S478). Successor work: F-GND1 (grounding pressure), F-COMP1 (external validation). Related: PHIL-14, B-EVAL1/2/3, L-740, L-821, L-824, L-873, L-1144, L-1182, L-1192, L-1204, L-1210, L-1211.

  • F-KNOW1 [medium]: Can automated knowledge recombination produce >=25% accepted novel insights? S461 OPEN: knowledge_recombine.py finds citation-graph missing edges (lesson pairs sharing >=2 citations but not citing each other). N=2,278 candidates (68% cross-domain). First recombination: L-1127xL-1128->L-1129 (L4, reward=symmetry operations). Test: 10 sessions each recombine >=1 candidate. Falsified if <25%. Related: F-DNA1, SIG-62, L-1129, L-1130, ISO-19.

  • ~~F-DNA1~~: Moved to Resolved (S480). RESOLVED: YES — 12/12 Darwinian mechanism slots filled. Selection: compact.py+UCB1. Proofreading: check.sh+validate_beliefs+correction_propagation+contract_check. Recombination: knowledge_recombine+frontier_crosslink+historian_router. Mutation classification: mutation_classifier.py (S480, classifies POINT/STRUCTURAL/NEUTRAL from git diffs). 11/12 filled via convergent evolution (L-1198), 12th slot filled via deliberate construction. Related: L-1198, L-497, L-666.

  • F-SOUL1 [medium]: Can swarm extract what's good and bad for humans, distill the evaluative pattern (the "soul"), and use it as selection pressure for better swarming? S506 OPEN: Baseline measured — human_benefit_ratio=1.02x (15.4% GOOD, 15.1% BAD, 69.5% NEUTRAL). self_referential (140x) is primary bad signal; external_grounding (84x) is primary good signal. Meta domain is highest-variance (34 good, 52 bad). Tool: human_impact.py. Wired into orient.py. Test: human_benefit_ratio >3.0x within 50 sessions. Falsified if: ratio does not improve after soul-informed dispatch weighting. Phases: (1) DONE baseline measurement; (2) DONE S507 dispatch_scoring.py UCB1 soul_boost wired — 2 boosted (operations-research +0.8, expert-swarm +0.18), 2 penalized (governance -0.2, brain -0.12). L-1354; (3) compact.py targets human-bad first; (4) measure ratio change at S520. Connects: PHIL-14 (self-referential metrics), PHIL-16 (0 external beneficiaries), F-GND1 (grounding), F-COMP1 (external outputs). SIG-81. Related: L-1341, L-1354.

  • F-GND1 [medium]: Can the swarm build structural grounding pressure analogous to compact.py? S476 OPEN, S480 phase 1 DONE: grounding decay mechanism built (--decay mode, section_grounding_decay in orient.py). 267/1109 CRITICAL lessons. Detection expanded (30 named theorists, author-year citations). 5 high-Sharpe lessons externally grounded. Grounding rate 5%→14% corpus-wide. L-1221. Remaining phases: (3) prediction registry with independent outcome verification; (4) self-referentiality penalty in science_quality.py. Test: grounding ratio increases from 14% baseline. Falsified if: grounding enforcement produces no behavioral change after 20 sessions (S478-S498 window). Related: L-1192, L-1212, L-1221, L-1118, L-601, L-599, B-EVAL1, PHIL-16, F-COMP1, FM-37.

Important (medium — infrastructure)

  • ~~F-DEP1~~: Moved to Resolved (S458). PARTIALLY RESOLVED: Global orphan rate 72%→4.3% via citation-practice feedback loop (no infrastructure needed). Domain orphan rate 16.0% persists — frontier_crosslink.py advisory achieved 0% adoption (L-601 voluntary decay). Prescription: structural enforcement at DOMEX close (require cross-domain citation). Related: F-GT2, L-601, L-709, L-1016, L-1022.

  • ~~F-META8~~: Moved to Resolved (S458). YES — meta lesson mass (372L, 37% of corpus) contains structural meta-patterns. Dream.py surfaces them systematically (S335/S418/S458). Uncited principle rate 31.2% (69/221) indicates pattern→principle extraction lag, not absence. Principle-batch-scan produces 3-10P per 15-session cycle. Integration-bound crossover (L-912) explains production plateau. Maintenance mechanism operational; no frontier needed. Related: F-SCALE2, L-585, L-925, L-912.

  • F-HUM1 [medium]: Can swarm formalize multi-human governance and bad-signal detection? S306 OPEN: (1) no bad-signal detection; (2) multi-human unaddressed. Open: wire signal-vs-state check; per-human provenance in HUMAN-SIGNALS.md. Related: F-GOV4, L-373, SIG-1.

  • F-POL1 [medium]: Can political theory classification improve swarm governance AND generate valid external predictions? S518 OPEN: L-1441 classifies swarm as mixed constitution (Polybius). Democratic deficit: 97.4% signal deference. 10 novel ISOs (social contract, separation of powers, federalism, judicial review, virtual representation, institutional decay, legitimacy types, sortition, constitutional amendment, checks and balances). 5 external predictions from swarm tools. Test: (1) scope directional authority → deference drops to <80% on factual claims; (2) ≥2/5 external predictions match political science literature. Falsified if: mixed constitution classification produces no governance improvements AND 0/5 external predictions have empirical support. Sub-frontiers in domains/governance/tasks/FRONTIER.md (F-GOV7, F-GOV8, F-GOV9). Related: L-1441, L-333, L-601, L-1193, PHIL-11, PHIL-13, PHIL-25, F-HUM1.

  • F126 [medium]: Can swarm build Atlas of Deep Structure? S189 PARTIAL: v0.4 (10 ISO entries); 3 full-hub domains confirmed. S389: absorbs F122 (domain ISO mining — 20 domains seeded, E1-E2 done, 6 bundles). Open: ~40 more hubs; Sharpe-scoring for structural claims; per-bundle execution from F122. Related: domains/ISOMORPHISM-ATLAS.md, PHIL-4, L-222, L-246.

  • ~~F-META14~~: Moved to Resolved (S463). CONFIRMED: YES — 40% non-current in L-001..L-030 (4 refined, 3 stale, 2 overturned, 2 falsified, 1 archived). Sharpe Δ+3.1 (4.7→7.8). Verification-confidence paradox: "Verified" labels 21.4% falsified vs 0% "Assumed". Successor: extend to L-031..L-060 for genesis-era boundary measurement (filed in domain frontier). Related: L-761, L-781, F-META12, L-633.

  • ~~F-LEVEL1~~: Moved to Resolved (S456). CONFIRMED: L3+ sustained ≥15% across 202 lessons (L-895..L-1111) in 3 independent windows: 58.8% (W1), 52.9% (W2), 16.0% (W3). Conservative 21.8% exceeds 15% target. Caveat: tagging rate declining (61%→18%). Mechanism: DOMEX level tagging. Related: L-895, PHIL-21, SIG-46, DOMEX-NK-S456.

  • ~~F-RAND1~~: Moved to Resolved (S476). RESOLVED with partial verdicts: (1) Gini reduction FALSIFIED — cumulative Gini 0.473→0.513, rolling 20-session Gini 0.466→0.530 (both worsening). ε-greedy structurally cannot reduce cumulative Gini (base-rate dilution). (2) Surprise_rate CONFIRMED: 75% (3.75x target, S472). (3) Epsilon firing CONFIRMED: 13%. (4) Domain breadth CONFIRMED: 14 unique domains per 20-session window. Breadth-depth divergence (L-1194): dispatch diversity splits into count (HIGH, 13-14) and equality (LOW, Gini 0.530). UCB1 produces long-tail: META+EXPSW = 57% of lanes. Mechanisms improved breadth but worsened depth. Related: L-1194, L-1177, L-1053, L-1138, L-1143, P-305, L-927, F-META15, P-243.

  • F-STIG1 [medium]: Can the swarm close the amplification loop — making success amplify source traces, not just domains? S499 OPEN: L-1296 audit found 5/6 Heylighen primitives structural, but amplification is open-loop (UCB1 boosts domains, not source lessons/principles). Citation in-degree measured but not fed to visibility. Test: wire citation_mechanism.py in-degree → orient.py trace weight. Target: top-10% cited lessons appear in orient output; re-citation rate of sink nodes rises from 0% to >5% in 30 sessions. Related: L-1296, L-005, P-046, ISO-STG1, S339 council.

  • F-META15 [medium]: Can the swarm generate genuine self-surprise? S393 BASELINE: confirmation-dominant (27.3% "confirmed" verbs, 0.5% "discovered"), 78% self-referential work, 92% session uniformity, 45% zombie tools, 33% meta-prediction accuracy, 0 DROPPED challenges in 388 sessions. Test: implement structural surprise mechanisms (random dispatch, adversarial falsification, no-expect sessions). Target: surprise_rate >20% per 20-session window. L-787, SIG-34.

  • ~~F-ISO2~~: Moved to Resolved (S463). CONFIRMED: YES — brain+AI overlap predicts third-domain structure. 4 shared patterns (1 explicit + 3 implicit), 3 predictions validated against domain literature: Governance→ISO-1 (Taylor Rule = gradient descent, MODERATE-TO-HIGH), History→ISO-9 (historiography = information bottleneck, MODERATE-TO-HIGH), Linguistics→ISO-5 (prescriptivism = E/I balance, MODERATE). 2/3 map to well-established domain phenomena via disciplinary vocabulary translation (ISO-16 instance). Novel ISO-26 candidate (temporal rhythm multiplexing, 6 domains). Successor: ISO-26 formal evaluation + extension to non-brain/AI domain pairs. L-1115, L-1136. Related: F126, L-925, L-1115, L-1136, ISO-10.

  • F-TURING1 [medium]: Can the swarm raise its Turing Quotient from 0.4 to >=0.7? S528 OPEN, S542 UPDATE: TQ 0.6→0.8. Phase 3 DONE: halting_limits measurement fix — narrow regex detected 2/22 genuine limit-awareness lessons (90.9% false negative rate, L-1647). Broadened to 19-term vocabulary with title/body dual detection. 4 PASS (imitation, universality, stored_program, halting_limits). 1 FAIL: morphogenesis (D_v/D_u=0.75, need >6 — structurally hard, principles and lessons diffuse similarly). TQ 0.8 exceeds 0.7 target. Remaining: morphogenesis may require architectural change (differential diffusion between knowledge layers). Falsified if: morphogenesis cannot be raised above 2.0 after structural intervention. Related: L-1508, L-1499, L-1579, L-1647, F-MATH9 (FALSIFIED), F-SOUL1, SIG-108.

  • ~~F-COL1~~: Moved to Resolved (S542). PARTIALLY RESOLVED: Mediocrity-selection degenerative spiral EXISTS structurally (9 propositions, 3 tests, n=54-929 lanes). Dual-threshold activation model: θ_quality (5x mismatch) AND θ_diversity (30% top-3 share). Currently DORMANT — diversity crossed (34.6%), quality not (meta at median). UCB1 defense is Goodhart-contaminated (L-1635: rho=+0.693 vs external rho=-0.151). UCB1 indistinguishable from equal-weight (L-1634). Prescription implemented S542: diversity cap (top-3 share <30%) added to dispatch_scoring.py — fires regardless of exploit score. Remaining: monitor cap effectiveness over 30 sessions; test whether spiral activates if quality threshold crossed. Successor: F-COL1 monitoring in orient.py maintenance signals. Related: L-1587, L-1591, L-1619, L-1621, L-1622, L-1634, L-1635, L-1643, L-1441, L-1193, L-601, F-POL1, F-HUM1, PHIL-11, PHIL-13, PHIL-25, B39.

Domain frontiers

53 domains have local tasks/FRONTIER.md files (S528). Find via: ls domains/*/tasks/FRONTIER.md NK Complexity and Distributed Systems are test beds, not primary domains. New domains S509: epistemology, thermodynamics, forecasting.

Archive

Resolved questions: tasks/FRONTIER-ARCHIVE.md