What we tested, and killed.
Every service shows you its winners. We show you the graveyard. These are hypotheses we ran through the same walk-forward, real-fill, out-of-sample machine that gates our live book — and rejected. Publishing failures is unheard of in this market. It's also the only honest way to earn the word "verifiable."
The graveyard
Latency alone
- Hypothesis
- A faster stack (3 vs 15 vs 50 ms) buys directional edge on the micro-align signal.
- Test
- Same candidates replayed at 3, 15 and 50 ms on real MBO fills.
- Verdict
- Essentially identical P&L across all three latencies. The deficit is placement cost, not speed. Speed was not the missing ingredient.
Entry placement (the eo / fast-cancel axis)
- Hypothesis
- Smarter limit placement — offsets, cross-vs-skip, a 100 ms fill-or-cancel — rescues the directional micro-align signal.
- Test
- 24-day online walk-forward selector across all 6 execution regimes (std/colo × cross/skip × ±100 ms cancel).
- Verdict
- Negative in all six regimes. An apparent "+$36/trade" cell turned out to be a warm-up artifact that excluded the three worst days from the window. Placement cannot rescue a signal with no direction.
Market-maker pull on the violence detector
- Hypothesis
- An AUC-0.9 "violence-incoming" detector lets a passive maker pull quotes on toxic flow and harvest the spread.
- Test
- Per-fill markouts sliced by event-intensity quintile, at both decision time and the latest-knowable fill time.
- Verdict
- Every touch fill costs ≈ −0.87t regardless of state — flat across all toxicity buckets. The detector times movement, but movement cost is priced into every fill. Pull-on-detector is worthless here.
ORB-gating of confluence
- Hypothesis
- Conditioning entries on alignment with the opening-range break improves the level-fade families.
- Test
- 13,797 signals over 24 days, bucketed by relationship to the break.
- Verdict
- Aligned-with-break is the worst bucket (−4.00t). Confluence-after-break is crowd-following exhaustion, not edge. (This inverted into a real finding: fade the break, don't follow it.)
The gap-fill magnet
- Hypothesis
- An opening gap reliably gets filled by the close — trade toward the prior close.
- Test
- Session-context study across the 2-year window.
- Verdict
- P(fill by close) ≈ 0.39 — worse than a coin flip. Big gaps run, they don't revert. The "magnet" is folklore.
Afternoon ORB
- Hypothesis
- The opening-range-break edge repeats on an afternoon range.
- Test
- Same detector, afternoon window, 2-year harness.
- Verdict
- Dead — ±0.5t. The morning edge is specific to the cash-open auction; it does not generalize to the afternoon.
The bull-flag premium
- Hypothesis
- Bull flags outperform bear flags (the textbook claim), and breakout direction carries information.
- Test
- Multi-scale flag study, pole × box × polarity, 2 years, NQ + RTY.
- Verdict
- No premium at any scale (bull +44.7t vs bear +45.5t), and breakout direction carries no information (with ≈ counter). Flags are direction-agnostic range-expansion events. (The OCO bracket that ignores direction did survive — see the dashboard.)
Sub-second direction (the whole book tensor)
- Hypothesis
- A fully-trained model over the entire order-book-skew tensor predicts short-horizon direction.
- Test
- 504 features, ridge, 16-day train / 7-day test, at 10/30/60/120s.
- Verdict
- Test AUC 0.500–0.514 — a coin flip. The book carries rich timing information (AUC 0.91+ for "movement incoming") but no direction. We sell the timing and the levels, never a sub-second directional black box.
ES → NQ lead-lag & basis reversion
- Hypothesis
- ES flow leads NQ direction; the ES–NQ basis mean-reverts tradeably.
- Test
- 9.6M joined samples, 14-day train / 6-day test, both directions, with own-return controls.
- Verdict
- Lead-lag is pure volatility spillover (direction asymmetry ≤ 0.01). Basis reversion is the session's first real directional asymmetry (sign-stable, 6/6 test days) — but the tails give ≈ 3pp edge, roughly 50× below the cost floor. Real, honest, and not worth trading. We tell you so.
Why publish this? Because the failures are the proof. A service that only shows winners is unfalsifiable — you can't tell skill from a lucky screenshot. By publishing what we killed, on the same rigor as what we kept, we make the whole record checkable. The graveyard is the receipt for the survivors.
Required Disclosures
CFTC RULE 4.41 — SIMULATED PERFORMANCE
HYPOTHETICAL OR SIMULATED PERFORMANCE RESULTS HAVE CERTAIN LIMITATIONS. UNLIKE AN ACTUAL PERFORMANCE RECORD, SIMULATED RESULTS DO NOT REPRESENT ACTUAL TRADING. ALSO, SINCE THE TRADES HAVE NOT BEEN EXECUTED, THE RESULTS MAY HAVE UNDER-OR-OVER COMPENSATED FOR THE IMPACT, IF ANY, OF CERTAIN MARKET FACTORS, SUCH AS LACK OF LIQUIDITY. SIMULATED TRADING PROGRAMS IN GENERAL ARE ALSO SUBJECT TO THE FACT THAT THEY ARE DESIGNED WITH THE BENEFIT OF HINDSIGHT. NO REPRESENTATION IS BEING MADE THAT ANY ACCOUNT WILL OR IS LIKELY TO ACHIEVE PROFIT OR LOSSES SIMILAR TO THOSE SHOWN.
Trading futures involves substantial risk of loss and is not suitable for all investors. Past or hypothetical performance is not indicative of future results.
Read the complete required disclosures — Rule 4.41, full futures & options risk, the impersonal-publisher statement, and data/simulation limitations — on the Disclosures & Risk Disclosure page.