Heuristic Scorecard¶

Which market decision was made on which heuristic — every named heuristic (P-NNN / L-NNN / ISO-N) the swarm cited to drive a forecast, scored by how the market resolved those calls. Autodiff/backtesting on verbal statements.

🟢 live tended 2026-06-08 heuristics finance calibration credit-assignment backtesting external-output F-COMP1

Literature statements — backtested on history¶

The fast loop: clear empirical claims from the finance literature, each encoded as a price-derivable rule and backtested walk-forward on ~10 years of data. OOS Sharpe is the out-of-sample number (held-out last 30%) — the honest one. No transaction costs; in-sample bias controlled by the split; N<20 is direction-only.

Statement	Claim	Citation	OOS Sharpe	OOS N	Best asset (OOS Sharpe)
`STMT-001` 12-1 momentum	An asset trading above its level of ~12 months ago (skipping the…	Jegadeesh & Titman (1993); Asness, Moskowitz & Pedersen (2013)	+0.70	228	GLD (+1.55)
`STMT-005` 52-week-high breakout	An asset trading near its trailing 52-week high tends to continu…	George & Hwang (2004) 'The 52-Week High and Momentum Investing'	+0.50	81	GLD (+2.46)
`STMT-002` Trend (50/200 MA cross)	When the 50-day moving average is above the 200-day, the trend i…	Moskowitz, Ooi & Pedersen (2012) 'Time Series Momentum'	+0.39	234	GLD (+1.47)
`STMT-004` Low-volatility regime	Equities in a low realized-volatility regime earn better risk-ad…	Ang, Hodrick, Xing & Zhang (2006) 'The Cross-Section of Volatility'	+0.00 ⚠	2	SPY (+0.00)
`STMT-003` Short-horizon mean reversion	Over a few weeks, an asset stretched far from its recent mean te…	Jegadeesh (1990) 'Evidence of Predictable Behavior of Security Returns'	-0.05	194	EEM (+0.81)

Backtest window 2015-01-01 → 2026-06-02. Caveats: in-sample bias (OOS reported separately); NO transaction costs / slippage; single-asset; non-stationarity: historical edge may not persist (see live verbal-Sharpe).

Comprehensive heuristic — today's composite decision¶

The decision step: for each asset, the applicable statements are combined into one call, weighted by each statement's OOS Sharpe on that asset. A statement with no historical edge on an asset gets zero weight, so the system stays silent where it has no basis. Each decision shows exactly which statement pushed it, and how hard.

EEM → BULL (conf 0.65, score +0.57, 4 statements): STMT-005 +1×w1.59, STMT-001 +1×w0.86, STMT-003 -1×w0.81, STMT-002 +1×w0.48
GLD → BULL (conf 0.64, score +0.55, 3 statements): STMT-001 +1×w1.55, STMT-002 +1×w1.47, STMT-005 +0×w2.46
IWM → NEUTRAL (conf 0.5, score +0.00, 0 statements): —
QQQ → BULL (conf 0.7, score +1.00, 3 statements): STMT-005 +1×w1.48, STMT-001 +1×w1.35, STMT-002 +1×w0.90
SPY → BULL (conf 0.7, score +1.00, 3 statements): STMT-005 +1×w1.11, STMT-001 +1×w1.04, STMT-002 +1×w0.82
WTI → NEUTRAL (conf 0.5, score +0.00, 0 statements): —

Computed 2026-06-02. composite_heuristic.py --asset <X> --register files a decision as a pre-registered prediction (handing it to the live loop below).

Market-validated heuristics¶

Heuristics attached to a prediction while it was still open (pre-registered), then scored when the market resolved it. This is the only set that earns verbal-Sharpe.

Heuristic	Claim (verbal statement)	Domains	#Calls	Hit-rate	Mean Brier	verbal-Sharpe	Trend	Verdict
`CAND-005`	Defensive rotation: utilities/healthcare (non-cyclical, AI-electric…	finance, forecasting	1	0%	0.203	+0.048	—	👁 watch
`CAND-004`	Stretched-level consolidation: an over-extended asset at an extreme…	finance, forecasting, health	3	67%	0.236	+0.025	▲	⬆️ promote
`CAND-003`	Divergence mean-reversion: a lagging asset catches up to its driver…	finance, forecasting	1	0%	0.250	+0.000	—	👁 watch
`CAND-001`	2-week momentum continuation: a leading asset in a risk-on regime c…	cryptocurrency, finance, forecasting	6	0%	0.309	-0.145	▼	⬇️ prune
`CAND-002`	Low-VIX-in-conflict mean reversion: VIX below 20 during active mili…	finance, forecasting	1	100%	0.640	-0.390	—	👁 watch

Decision → heuristic traces¶

Expand a heuristic to see every call it drove and how each resolved.

CAND-005 — verbal-Sharpe +0.048 over 1 call(s) · 👁 watch

> Defensive rotation: utilities/healthcare (non-cyclical, AI-electricity tailwind) outperform on a geopolitical/VIX shock. | Prediction | Role | Weight | Outcome | Brier | Vindicated | |------------|------|-------:|---------|------:|:----------:| | PRED-0030 | driver | 1.00 | ❌ INCORRECT | 0.203 | ✗ |

CAND-004 — verbal-Sharpe +0.025 over 3 call(s) · ⬆️ promote

> Stretched-level consolidation: an over-extended asset at an extreme consolidates sideways rather than breaking out or down over ~2 weeks. | Prediction | Role | Weight | Outcome | Brier | Vindicated | |------------|------|-------:|---------|------:|:----------:| | PRED-0024 | driver | 1.00 | ❌ INCORRECT | 0.302 | ✗ | | PRED-0027 | driver | 1.00 | ✅ CORRECT | 0.203 | ✔ | | PRED-0029 | driver | 1.00 | ✅ CORRECT | 0.203 | ✔ |

CAND-003 — verbal-Sharpe +0.000 over 1 call(s) · 👁 watch

> Divergence mean-reversion: a lagging asset catches up to its driver when their historical relationship diverges (e.g. WTI->XLE) within 2-3 weeks. | Prediction | Role | Weight | Outcome | Brier | Vindicated | |------------|------|-------:|---------|------:|:----------:| | PRED-0025 | driver | 1.00 | ❌ INCORRECT | 0.250 | ✗ |

CAND-001 — verbal-Sharpe -0.145 over 6 call(s) · ⬇️ prune

> 2-week momentum continuation: a leading asset in a risk-on regime continues its trend over a ~2-week window. | Prediction | Role | Weight | Outcome | Brier | Vindicated | |------------|------|-------:|---------|------:|:----------:| | PRED-0019 | driver | 1.00 | ❌ INCORRECT | 0.250 | ✗ | | PRED-0020 | driver | 1.00 | ❌ INCORRECT | 0.250 | ✗ | | PRED-0021 | driver | 1.00 | ❌ INCORRECT | 0.302 | ✗ | | PRED-0022 | driver | 1.00 | ❌ INCORRECT | 0.490 | ✗ | | PRED-0026 | driver | 1.00 | ❌ INCORRECT | 0.360 | ✗ | | PRED-0028 | driver | 1.00 | ❌ INCORRECT | 0.203 | ✗ |

CAND-002 — verbal-Sharpe -0.390 over 1 call(s) · 👁 watch

> Low-VIX-in-conflict mean reversion: VIX below 20 during active military conflict reverts upward within ~2 weeks. | Prediction | Role | Weight | Outcome | Brier | Vindicated | |------------|------|-------:|---------|------:|:----------:| | PRED-0023 | driver | 1.00 | ✅ CORRECT | 0.640 | ✔ |

Reverse trace — which heuristics drove each resolved call¶

The macro view: open a forecast and see the named heuristics behind it, their weights, and whether each was vindicated by the outcome.

PRED-0019 BULL SPY → ❌ INCORRECT (Brier 0.25): CAND-001(100%)
PRED-0020 BULL QQQ → ❌ INCORRECT (Brier 0.25): CAND-001(100%)
PRED-0021 BULL BTC → ❌ INCORRECT (Brier 0.3025): CAND-001(100%)
PRED-0022 BULL WTI → ❌ INCORRECT (Brier 0.49): CAND-001(100%)
PRED-0023 BULL VIX → ✅ CORRECT (Brier 0.64): CAND-002(100%)
PRED-0024 NEUTRAL GLD → ❌ INCORRECT (Brier 0.3025): CAND-004(100%)
PRED-0025 BULL XLE → ❌ INCORRECT (Brier 0.25): CAND-003(100%)
PRED-0026 BEAR DXY → ❌ INCORRECT (Brier 0.36): CAND-001(100%)
PRED-0027 NEUTRAL AAPL → ✅ CORRECT (Brier 0.2025): CAND-004(100%)
PRED-0028 BULL IWM → ❌ INCORRECT (Brier 0.2025): CAND-001(100%)
PRED-0029 NEUTRAL XLV → ✅ CORRECT (Brier 0.2025): CAND-004(100%)
PRED-0030 BULL XLU → ❌ INCORRECT (Brier 0.2025): CAND-005(100%)
PRED-0001 BEAR SPY → ❌ INCORRECT (Brier 0.04): ISO-4(33%), ISO-11(33%), ISO-2(33%) (backfill — illustrative)
PRED-0004 BULL GLD → ❌ INCORRECT (Brier 0.04): ISO-8(100%) (backfill — illustrative)
PRED-0005 BEAR QQQ → ❌ INCORRECT (Brier 0.04): ISO-2(100%) (backfill — illustrative)
PRED-0008 BULL VIX → ❌ INCORRECT (Brier 0.04): ISO-13(33%), L-1287(33%), L-1289(33%) (backfill — illustrative)
PRED-0015 BEAR AAPL → ❌ INCORRECT (Brier 0.04): ISO-2(100%) (backfill — illustrative)
PRED-0003 BULL TLT → ➕ PARTIAL (Brier 0.135): ISO-4(100%) (backfill — illustrative)

Illustrative backfill (excluded from scoring)¶

Heuristics reconstructed from the theses of already-resolved predictions. They show the mechanism working, but they are hindsight — attached after the outcome was known — so they earn no verbal-Sharpe and never drive evolution verdicts. Note a heuristic can show positive calibration skill at a 0% direction hit-rate: a wrong-but-low-confidence call has a low Brier, which is good calibration.

Heuristic	Claim (verbal statement)	Domains	#Calls	Hit-rate	Mean Brier	verbal-Sharpe	Trend	Verdict
`ISO-2`	Sector rotation to utilities/energy signals defensive positioning (…	control-theory, evolution, finance…	3	0%	0.040	+0.364	▲	◻️ illustrative
`ISO-11`	Oil shock from Strait of Hormuz closure propagates through economy …	control-theory, finance, history…	1	0%	0.040	+0.210	—	◻️ illustrative
`ISO-8`	ISO-8 power law: gold tends to move in explosive bursts, not gradua…	distributed-systems, game-theory, history…	1	0%	0.040	+0.210	—	◻️ illustrative
`ISO-13`	ISO-13 integral windup: Hormuz closure is accumulating economic dam…	catastrophic-risks, control-theory, finance…	1	0%	0.040	+0.210	—	◻️ illustrative
`L-1287`	The damage accumulates like debt (Hindu karma pattern, L-1287).	catastrophic-risks, control-theory, finance…	1	0%	0.040	+0.210	—	◻️ illustrative
`L-1289`	Unanimity-as-failure (L-1289): VIX has been trading in 20-30 range …	catastrophic-risks, control-theory, finance…	1	0%	0.040	+0.210	—	◻️ illustrative
`ISO-4`	CAPE at 39 is 2x historical avg — only 1929 and 2000 were comparabl…	control-theory, finance, history…	2	38%	0.111	+0.196	▲	◻️ illustrative

Pending — pre-registered attributions (open calls)¶

Heuristics attached to open predictions before the outcome is known. These earn verbal-Sharpe the moment their call resolves — this is where forward credit comes from.

PRED-0031 BULL SPY/QQQ (conf 55%, due 2026-06-06): CAND-001(100%)
PRED-0002 BULL XLE (conf 30%, due 2026-06-20): ISO-13(100%)
PRED-0006 BULL BTC (conf 65%, due 2026-06-20): ISO-4(50%), L-1289(50%)
PRED-0007 BEAR DXY (conf 35%, due 2026-06-20): L-1287(50%), ISO-6(50%)
PRED-0009 BULL XLU (conf 30%, due 2026-06-20): CAND-005(100%)
PRED-0010 BULL XLV (conf 40%, due 2026-06-20): CAND-005(100%)

Probationary — foraged candidates (CAND-NNN)¶

Candidate heuristics minted from foraged papers (tools/paper_intake.py --emit-candidate-heuristic). They join the pool on probation: a prediction can cite one as a driver, and once enough such calls resolve it graduates to a P-NNN principle (PROMOTE) or is dropped (PRUNE).

Candidate	Type	Source	Applied?	Claim
`CAND-001`	THESIS	thesis:PRED forecasting batch (S499-S547)	✅	2-week momentum continuation: a leading asset in a risk-on regime continues i…
`CAND-002`	THESIS	thesis:PRED forecasting batch (S499-S547)	✅	Low-VIX-in-conflict mean reversion: VIX below 20 during active military confl…
`CAND-003`	THESIS	thesis:PRED forecasting batch (S499-S547)	✅	Divergence mean-reversion: a lagging asset catches up to its driver when thei…
`CAND-004`	THESIS	thesis:PRED forecasting batch (S499-S547)	✅	Stretched-level consolidation: an over-extended asset at an extreme consolida…
`CAND-005`	THESIS	thesis:PRED forecasting batch (S499-S547)	✅	Defensive rotation: utilities/healthcare (non-cyclical, AI-electricity tailwi…

What is this?¶

The Forecast Dashboard records pre-registered market calls. This page answers a different question: which named heuristic was each decision made on, and did it earn its keep?

A heuristic is a named verbal rule the swarm already holds — a principle (P-NNN), a lesson (L-NNN), or a cross-domain isomorphism (ISO-N).
Each prediction attaches the heuristics that drove it, with a weight (how much each drove the call) and a role (driver / risk / counter).
When the call resolves, the Brier score is the loss signal, split across the driver heuristics by weight — the verbal analogue of backprop's credit assignment.
verbal-Sharpe rewards heuristics that are well-calibrated and repeatedly applied; the evolution loop promotes the top, prunes the negative, and compacts the redundant.

New candidate heuristics enter the pool from foraged papers (tools/paper_intake.py --emit-candidate-heuristic, namespace CAND-NNN) and are validated the same way — external research in, market truth out.

Sources: experiments/finance/heuristics/scorecard.json (computed by tools/heuristic_backtest.py) over experiments/finance/predictions/registry.json.