islandflow/smart-money-rebuild-phase-01.md
dirtydishes df35d49784 Add smart money rebuild plan and taxonomy
- Replace tape overhaul notes with the new smart-money classification plan
- Add the taxonomy reference doc for the rules-first rebuild
2026-05-04 18:00:50 -04:00

7.8 KiB

Smart Money Rebuild Plan

Summary

Rebuild the current packet-threshold classifier into a rules-first, parent-event, multi-profile system driven by the taxonomy in smartmoney.md. The first milestone will ship a new event model, feature pipeline, profile rule engine, event-calendar enrichment, deterministic synthetic scenarios, and a compatibility bridge to current alerts/UI. We will explicitly ignore anything that requires owner/account identity, supervised model training, anomaly detection, or speculative profile claims we cannot support from public-tape-style data.

Scope In

  • Core 6 primary profiles: institutional_directional, retail_whale, event_driven, vol_seller, arbitrage, hedge_reactive
  • Parent-event reconstruction from child prints, NBBO context, structure context, and underlying context
  • Probabilistic rule scores with reason codes and abstentions
  • External corporate-event calendar support via services/refdata
  • Scenario-driven synthetic options/equity/quote generation for tests, replay, and demos
  • Compat bridge from new profile model back to current ClassifierHitEvent and AlertEvent

Scope Out

  • Supervised model training/inference in v1
  • Unsupervised anomaly detection in v1
  • prop/professional customer as a first-class output
  • Claims about beneficial owner, account class, or illegal intent
  • Real-time use of next-day open interest
  • Rule 606/CAT/private broker data integrations

Phase 0: Planning Artifact

  • Create SMART_MONEY_REBUILD_PLAN.md at repo root as the living implementation document.
  • Copy this phased plan into that file and add per-phase checklists, acceptance criteria, and migration notes.
  • Treat that file as the session handoff and implementation tracker, while still using bd for issue tracking.

Phase 1: Contracts and Storage

  • Add a new event contract in packages/types for SmartMoneyEvent with:
    • event_id, packet_ids, member_print_ids, underlying_id, event_kind, event_window_ms
    • features as structured typed fields, not only loose string/number maps
    • profile_scores: { profile_id, probability, confidence_band, direction, reasons[] }[]
    • primary_profile_id, primary_direction, abstained, suppressed_reasons[]
  • Keep FlowPacket during bridge, but stop treating it as the final semantic unit.
  • Keep ClassifierHitEvent, but derive it from SmartMoneyEvent.primary_profile_id plus legacy mapping.
  • Add storage support in packages/storage for smart_money_events.
  • Extend AlertEvent with optional primary_profile_id and profile_scores while preserving current fields.

Phase 2: Parent-Event Reconstruction

  • Add services/compute/src/parent-events.ts to group child prints into parent events.
  • Reconstruction key should use: contract, direction proxy, burst gap, venue burst context, and structure linkage.
  • Preserve special-print flags from conditions so auctions/crosses/complex-like prints can be suppressed or downweighted.
  • Allow two parent paths:
    • single_leg_event
    • multi_leg_event
  • Reuse current structure logic where useful, but move the semantic output to parent events instead of direct classifier hits.
  • Emit deterministic event IDs so batch replay and live scoring agree.

Phase 3: Feature Engineering

  • Add typed feature builders for:
    • aggressor mix, spread position, quote age, venue count, inter-fill timing, strike concentration
    • DTE, moneyness, ATM proximity, synthetic IV shock, spread widening, underlying move linkage
    • structure markers, same-size leg symmetry, net directional bias proxies
    • event alignment: days-to-event, expiry-after-event, pre-event concentration
  • Build event-calendar ingestion in services/refdata for earnings/corporate events from a simple external feed or static importable provider layer.
  • Live scoring may use only timestamp-available data; any later validation fields must be batch-only.

Phase 4: Rules Engine

  • Replace services/compute/src/classifiers.ts with profile rules centered on the six primary profiles.
  • Each rule returns probability, direction, reason codes, suppression reasons, and a confidence band.
  • Add explicit false-positive guards from the research doc:
    • special/complex/auction suppression for directional labels
    • retail-frenzy guard on short-dated OTM call bursts
    • hedge-reactive preference for 0-2 DTE ATM/high-gamma/reactive-underlier cases
    • arbitrage requirement for matched-leg symmetry and near-flat directional exposure
  • Keep existing structure-specific ideas like straddle/vertical/roll as evidence and reasons, not top-level end states.

Phase 5: Synthetic Market Redesign

  • Rework services/ingest-options/src/adapters/synthetic.ts around labeled parent-event templates instead of loose burst presets.
  • Add deterministic synthetic scenario families matching the core 6 profiles plus neutral background noise.
  • Each scenario must emit a coherent bundle:
    • child option prints
    • contemporaneous NBBO evolution
    • underlying quote path
    • IV response pattern
    • realistic conditions/venues/structure markers
  • Add two operating modes:
    • test: seeded, deterministic, low-noise, exact expected labels
    • demo: seeded, realistic background with controlled noise ratios
  • Keep synthetic hidden labels internal to tests/replay harnesses, not public production payloads.

Phase 6: Compute, API, and UI Rollout

  • In services/compute, emit SmartMoneyEvent first, then derive compat ClassifierHitEvent and AlertEvent.
  • In services/api, add read/stream endpoints for SmartMoneyEvent while preserving existing endpoints.
  • In apps/web/app/terminal.tsx, migrate rendering to profile-aware displays:
    • primary profile
    • probability ladder
    • reason codes
    • suppression/abstention state
  • During the bridge, old UI elements should continue working from mapped legacy hits.

Phase 7: Evaluation and Replay

  • Add deterministic rule tests per profile and per major false-positive case.
  • Add replay-style integration tests for live-vs-batch consistency.
  • Add synthetic scenario acceptance tests proving:
    • the intended profile wins
    • nearby wrong profiles stay below a threshold
    • noisy background does not overwhelm expected results
  • Add evaluation utilities for parent-event precision/recall, calibration, abstention rate, and economic sanity checks.

Important API and Type Changes

  • New primary stream/table/type: SmartMoneyEvent
  • ClassifierHitEvent becomes a legacy-derived compatibility surface
  • AlertEvent gains optional profile metadata but keeps existing shape
  • FlowPacket remains during migration, but becomes an intermediate artifact rather than the final semantic alert object

Test Cases and Scenarios

  • Institutional directional: aggressive concentrated call/put burst with catalyst-aligned expiry
  • Retail whale: short-dated OTM attention-name chase with IV pop
  • Event-driven: pre-earnings aligned expiry and widening spreads
  • Vol seller: sell-side dominant overwrite/put-write/short-vol structure
  • Arbitrage: matched multi-leg parity-style event with low net directional bias
  • Hedge reactive: short-dated ATM burst tied to underlying move and gamma-sensitive conditions
  • False positives: auctions, complex prints, late/stale quote context, illiquid wide spreads, retail frenzy misread as institution, structure trades misread as direction

Assumptions and Defaults

  • Rollout mode: Compat Bridge
  • First milestone: Rules-first
  • Primary outputs: Core 6
  • Event-driven flow uses real external event-calendar enrichment in v1
  • prop/professional customer remains supporting evidence only
  • Existing rule labels like vertical_spread and zero_dte_gamma_punch become evidence/reason codes, not final business-facing profile IDs
  • Synthetic generation is optimized for deterministic realism, not maximum randomness