# Smart Money Rebuild Plan ## Summary Rebuild the current packet-threshold classifier into a `rules-first`, parent-event, multi-profile system driven by the taxonomy in [smartmoney.md](/Users/kell/Cloud/dev/islandflow/smartmoney.md). The first milestone will ship a new event model, feature pipeline, profile rule engine, event-calendar enrichment, deterministic synthetic scenarios, and a compatibility bridge to current alerts/UI. We will explicitly ignore anything that requires owner/account identity, supervised model training, anomaly detection, or speculative profile claims we cannot support from public-tape-style data. ## Scope In - Core 6 primary profiles: `institutional_directional`, `retail_whale`, `event_driven`, `vol_seller`, `arbitrage`, `hedge_reactive` - Parent-event reconstruction from child prints, NBBO context, structure context, and underlying context - Probabilistic rule scores with reason codes and abstentions - External corporate-event calendar support via `services/refdata` - Scenario-driven synthetic options/equity/quote generation for tests, replay, and demos - Compat bridge from new profile model back to current `ClassifierHitEvent` and `AlertEvent` ## Scope Out - Supervised model training/inference in v1 - Unsupervised anomaly detection in v1 - `prop/professional customer` as a first-class output - Claims about beneficial owner, account class, or illegal intent - Real-time use of next-day open interest - Rule 606/CAT/private broker data integrations ## Phase 0: Planning Artifact - Create `SMART_MONEY_REBUILD_PLAN.md` at repo root as the living implementation document. - Copy this phased plan into that file and add per-phase checklists, acceptance criteria, and migration notes. - Treat that file as the session handoff and implementation tracker, while still using `bd` for issue tracking. ## Phase 1: Contracts and Storage - Add a new event contract in `packages/types` for `SmartMoneyEvent` with: - `event_id`, `packet_ids`, `member_print_ids`, `underlying_id`, `event_kind`, `event_window_ms` - `features` as structured typed fields, not only loose string/number maps - `profile_scores: { profile_id, probability, confidence_band, direction, reasons[] }[]` - `primary_profile_id`, `primary_direction`, `abstained`, `suppressed_reasons[]` - Keep `FlowPacket` during bridge, but stop treating it as the final semantic unit. - Keep `ClassifierHitEvent`, but derive it from `SmartMoneyEvent.primary_profile_id` plus legacy mapping. - Add storage support in `packages/storage` for `smart_money_events`. - Extend `AlertEvent` with optional `primary_profile_id` and `profile_scores` while preserving current fields. ## Phase 2: Parent-Event Reconstruction - Add `services/compute/src/parent-events.ts` to group child prints into parent events. - Reconstruction key should use: contract, direction proxy, burst gap, venue burst context, and structure linkage. - Preserve special-print flags from conditions so auctions/crosses/complex-like prints can be suppressed or downweighted. - Allow two parent paths: - `single_leg_event` - `multi_leg_event` - Reuse current structure logic where useful, but move the semantic output to parent events instead of direct classifier hits. - Emit deterministic event IDs so batch replay and live scoring agree. ## Phase 3: Feature Engineering - Add typed feature builders for: - aggressor mix, spread position, quote age, venue count, inter-fill timing, strike concentration - DTE, moneyness, ATM proximity, synthetic IV shock, spread widening, underlying move linkage - structure markers, same-size leg symmetry, net directional bias proxies - event alignment: days-to-event, expiry-after-event, pre-event concentration - Build event-calendar ingestion in `services/refdata` for earnings/corporate events from a simple external feed or static importable provider layer. - Live scoring may use only timestamp-available data; any later validation fields must be batch-only. ## Phase 4: Rules Engine - Replace `services/compute/src/classifiers.ts` with profile rules centered on the six primary profiles. - Each rule returns probability, direction, reason codes, suppression reasons, and a confidence band. - Add explicit false-positive guards from the research doc: - special/complex/auction suppression for directional labels - retail-frenzy guard on short-dated OTM call bursts - hedge-reactive preference for 0-2 DTE ATM/high-gamma/reactive-underlier cases - arbitrage requirement for matched-leg symmetry and near-flat directional exposure - Keep existing structure-specific ideas like straddle/vertical/roll as evidence and reasons, not top-level end states. ## Phase 5: Synthetic Market Redesign - Rework `services/ingest-options/src/adapters/synthetic.ts` around labeled parent-event templates instead of loose burst presets. - Add deterministic synthetic scenario families matching the core 6 profiles plus neutral background noise. - Each scenario must emit a coherent bundle: - child option prints - contemporaneous NBBO evolution - underlying quote path - IV response pattern - realistic conditions/venues/structure markers - Add two operating modes: - `test`: seeded, deterministic, low-noise, exact expected labels - `demo`: seeded, realistic background with controlled noise ratios - Keep synthetic hidden labels internal to tests/replay harnesses, not public production payloads. ## Phase 6: Compute, API, and UI Rollout - In `services/compute`, emit `SmartMoneyEvent` first, then derive compat `ClassifierHitEvent` and `AlertEvent`. - In `services/api`, add read/stream endpoints for `SmartMoneyEvent` while preserving existing endpoints. - In `apps/web/app/terminal.tsx`, migrate rendering to profile-aware displays: - primary profile - probability ladder - reason codes - suppression/abstention state - During the bridge, old UI elements should continue working from mapped legacy hits. ## Phase 7: Evaluation and Replay - Add deterministic rule tests per profile and per major false-positive case. - Add replay-style integration tests for live-vs-batch consistency. - Add synthetic scenario acceptance tests proving: - the intended profile wins - nearby wrong profiles stay below a threshold - noisy background does not overwhelm expected results - Add evaluation utilities for parent-event precision/recall, calibration, abstention rate, and economic sanity checks. ## Important API and Type Changes - New primary stream/table/type: `SmartMoneyEvent` - `ClassifierHitEvent` becomes a legacy-derived compatibility surface - `AlertEvent` gains optional profile metadata but keeps existing shape - `FlowPacket` remains during migration, but becomes an intermediate artifact rather than the final semantic alert object ## Test Cases and Scenarios - Institutional directional: aggressive concentrated call/put burst with catalyst-aligned expiry - Retail whale: short-dated OTM attention-name chase with IV pop - Event-driven: pre-earnings aligned expiry and widening spreads - Vol seller: sell-side dominant overwrite/put-write/short-vol structure - Arbitrage: matched multi-leg parity-style event with low net directional bias - Hedge reactive: short-dated ATM burst tied to underlying move and gamma-sensitive conditions - False positives: auctions, complex prints, late/stale quote context, illiquid wide spreads, retail frenzy misread as institution, structure trades misread as direction ## Assumptions and Defaults - Rollout mode: `Compat Bridge` - First milestone: `Rules-first` - Primary outputs: `Core 6` - Event-driven flow uses real external event-calendar enrichment in v1 - `prop/professional customer` remains supporting evidence only - Existing rule labels like `vertical_spread` and `zero_dte_gamma_punch` become evidence/reason codes, not final business-facing profile IDs - Synthetic generation is optimized for deterministic realism, not maximum randomness