plan synthetic and smart-flow phases

2026-06-16 13:46:08 -04:00 · 2026-06-16 13:46:08 -04:00 · eaa22de302
commit eaa22de302
parent d1fac6c7ec
19 changed files with 1198 additions and 1 deletions
--- a/docs/plans/smart-flow-architecture-review.md
+++ b/docs/plans/smart-flow-architecture-review.md
@ -0,0 +1,135 @@
+# Architecture Review: Evidence-Backed Smart-Flow Detection
+
+## Summary
+
+No source code was modified. The current architecture is **not suitable as-is**, but it is **close enough to refactor, not rewrite**. The stack is right; the domain language and pipeline shape are not.
+
+Research direction: direct observation → inference → hypothesis, with preserved evidence and visible uncertainty. See [smart-flow-market-mechanics.md](/Users/kell/dev/islandflow/docs/research-docs/smart-flow-market-mechanics.md:7).
+
+Key code evidence: `FlowPacket` is a generic feature bag in [events.ts](/Users/kell/dev/islandflow/packages/types/src/events.ts:193), `SmartMoneyEvent` already has useful score/abstention fields in [events.ts](/Users/kell/dev/islandflow/packages/types/src/events.ts:283), compute emits smart-money events then compatibility hits/alerts in [index.ts](/Users/kell/dev/islandflow/services/compute/src/index.ts:1086), storage keeps core hypothesis detail as JSON in [smart-money-events.ts](/Users/kell/dev/islandflow/packages/storage/src/smart-money-events.ts:24), and replay currently replays raw market streams rather than validating the whole derived pipeline in [replay/index.ts](/Users/kell/dev/islandflow/services/replay/src/index.ts:69).
+
+## Area Classification
+
+| Area | Call | Architecture Review |
+|---|---:|---|
+| Domain model | **refactor** | Good bones, wrong center. Make evidence, hypotheses, scores, and alternatives first-class. |
+| Event taxonomy | **refactor** | Raw/derived split is good; `smart_money`, `dark.inferred`, and `classifier_hits` leak overconfident product language. |
+| Service boundaries | **refactor** | Ingest does too much signal policy; compute is too broad. Split pipeline stages before adding more intelligence. |
+| `FlowPacket` | **refactor** | Keep concept, rename/reframe as `FlowEvidenceCluster` or `FlowCandidate`. Not a product domain object. |
+| `SmartMoneyEvent` | **redesign** | Replace canonical object with `FlowHypothesisEvent`; use `SmartFlowInsight` only as UI/API projection. |
+| Classifier pipeline | **redesign** | Current rules mix evidence extraction, hypothesis scoring, narrative labels, and alerting. Needs staged outputs. |
+| ClickHouse/storage | **refactor** | Right datastore; raw tables are decent, derived evidence/hypotheses need typed/queryable columns plus JSON sidecars. |
+| Redis baselines/cache | **refactor** | Right hot-state role; wrong as hidden baseline truth. Baselines need replayable snapshots/versioning. |
+| NATS/JetStream subjects | **refactor** | Right bus; subjects should express stage/version: observations, evidence, hypotheses, insights. |
+| Replay determinism | **redesign** | Present but not central enough. Replay must be the acceptance gate for derived outputs. |
+| API/WebSocket | **refactor** | Mechanics are good; public surface should expose evidence bundles and hypotheses, not internal legacy names. |
+| UI evidence model | **refactor** | Directionally good, but still foregrounds “profile/probability” over evidence quality, alternatives, and uncertainty. |
+| Test strategy | **redesign** | Unit tests are solid scaffolding; needs fixture replay, false-positive suites, calibration, and end-to-end determinism. |
+
+## Direct Answers
+
+1. **Current suitability:** no. Useful infrastructure, but not yet an evidence-backed smart-flow architecture.
+2. **`SmartMoneyEvent`:** not a good canonical domain object. Use **`FlowHypothesisEvent`**. `ParticipantHypothesisEvent` implies participant identity too strongly. `SmartFlowInsight` should be a user-facing projection.
+3. **`FlowPacket`:** not as named. Keep the abstraction as an internal evidence cluster, rename to `FlowEvidenceCluster` or `FlowCandidate`.
+4. **Service boundaries:** not right. Ingest should normalize only; evidence quality, eligibility, clustering, hypothesis scoring, and insight projection should be separate stages.
+5. **ClickHouse/Redis/NATS roles:** yes broadly. ClickHouse = authoritative event/audit store. Redis = hot cache only. NATS = transport, not truth. All three need cleaner contracts.
+6. **Replay central enough:** no. It should be how every detection change proves itself.
+7. **UI uncertainty:** partially. It shows evidence refs, profile ladders, abstention, and suppression, but needs confidence vs conviction, alternative explanations, evidence quality, and “why not” signals.
+8. **First-class domain objects:** raw observations, execution context, quote join, eligibility decision, evidence cluster, structure hypothesis, evidence quality score, baseline snapshot, hypothesis score vector, false-positive penalty, catalyst context, flow hypothesis event, smart-flow insight, replay run.
+9. **Implementation details:** Redis list layout, durable consumer names, current classifier thresholds, ClickHouse batch writer, adapter internals, legacy `ClassifierHitEvent`, alert severity math, UI cache mechanics.
+10. **Delete/defer:** canonical “smart money” naming, real-time dark-pool certainty, standalone whale-premium alerts, trade-level open/close claims, participant identity claims, simplistic premium alert score, ingest-time signal filtering, `retail_whale` as a canonical profile unless reframed as attention/lottery flow.
+
+## Option A — Conservative
+
+Summary: keep current objects and services; add evidence-quality fields, UI copy fixes, and replay tests.
+
+Pros: fastest, lowest migration risk, preserves current endpoints and UI.
+
+Cons: leaves misleading canonical names; makes future research harder; keeps inference tangled inside current compute flow.
+
+Complexity: low. Migration risk: low.
+
+Better: less overconfidence, more visible suppression, quicker validation.
+
+Worse: domain debt remains; `SmartMoneyEvent` becomes harder to undo later.
+
+Likely kept: most code in `services/compute`, `packages/types`, `packages/storage`, API routes, UI panes.
+
+Likely rewritten: alert scoring, UI labels, some profile fields.
+
+Likely deleted: almost nothing.
+
+PR sequence:
+1. Rename UI copy from “Smart money” to “Smart flow candidate.”
+2. Add evidence-quality and alternative-explanation fields to existing event.
+3. Add replay consistency tests around current outputs.
+4. Add typed ClickHouse columns for high-value JSON fields.
+5. Deprecate, but do not remove, legacy classifier hit display.
+
+## Option B — Refactor
+
+Summary: keep Bun/TS, NATS, ClickHouse, Redis, API/WS, and the terminal UI, but rebuild the domain pipeline around evidence clusters and hypothesis events.
+
+Pros: fixes the product’s epistemic spine without wasting useful infrastructure; best fit for pre-alpha.
+
+Cons: breaking contract migration; touches types, storage, compute, API, UI, and tests.
+
+Complexity: medium-high. Migration risk: medium.
+
+Better: replayability, auditability, naming, evidence display, calibration, and future research velocity.
+
+Worse: more short-term churn; old demos and endpoints need compatibility aliases.
+
+Likely kept: raw market schemas, adapters, NATS/ClickHouse/Redis clients, live socket mechanics, virtualized UI, replay service skeleton, many feature calculations.
+
+Likely rewritten: `SmartMoneyEvent`, `FlowPacket`, classifier pipeline, alert projection, ClickHouse derived schemas, API channel names, UI evidence drawers.
+
+Likely deleted: canonical `smart_money` naming, ingest signal policy, premium-heavy alert scoring, `ClassifierHitEvent` as primary domain surface.
+
+PR sequence:
+1. Introduce `FlowEvidenceCluster`, `FlowHypothesisEvent`, `SmartFlowInsight`, `EvidenceQuality`, and version fields; keep aliases for compatibility.
+2. Move signal eligibility out of ingest; ingest publishes normalized observations plus execution context only.
+3. Split compute internally into evidence join → cluster/structure → hypothesis scoring → insight/alert projection.
+4. Replace derived JSON-only storage with typed query columns for evidence quality, hypothesis scores, model version, policy version, and refs.
+5. Add replay-run harness that recomputes derived outputs from raw streams and compares signatures.
+6. Add `/flow/evidence`, `/flow/hypotheses`, `/flow/insights` plus WS equivalents; keep legacy endpoints as aliases.
+7. Rework UI drawers/tables around evidence quality, confidence vs conviction, alternatives, abstention, and catalyst/noise context.
+8. Add fixture suites for stale quotes, complex spreads, 0DTE/event noise, deep ITM, wide spreads, and off-exchange ambiguity.
+
+## Option C — Redesign
+
+Summary: if starting over, build an event-sourced evidence engine with raw observations as the only source of truth and every derived artifact generated by versioned, replayable policies.
+
+Pros: cleanest long-term architecture; strongest research discipline; easiest calibration/backtesting story.
+
+Cons: slowest; overkill before product fit; discards too much working terminal and streaming infrastructure.
+
+Complexity: very high. Migration risk: high.
+
+Better: clean contracts, model versioning, deterministic replay, research-grade evidence lineage.
+
+Worse: delivery speed, continuity, and working UI velocity.
+
+Likely kept: market adapters, some schemas, ClickHouse client, NATS helpers, UI visual direction, selected tests.
+
+Likely rewritten: almost all compute, storage schemas, API contracts, replay, UI data model.
+
+Likely deleted: `FlowPacket`, `SmartMoneyEvent`, `ClassifierHitEvent`, `AlertEvent` as currently shaped, current subject hierarchy, current derived tables.
+
+PR sequence:
+1. Define new canonical event taxonomy and versioned policy registry.
+2. Build raw observation lake and deterministic replay runner first.
+3. Build evidence extraction and quote/condition eligibility services.
+4. Build cluster and structure hypothesis services.
+5. Build hypothesis scoring and calibration services.
+6. Build insight projection API.
+7. Rebuild terminal against new evidence/hypothesis contracts.
+8. Backfill or discard old derived data.
+
+## Recommendation
+
+Choose **Option B**.
+
+Bluntly: Option A is too timid for a pre-alpha product whose current names already fight the research. Option C is intellectually clean but wastes too much working infrastructure. Option B keeps the stack and terminal momentum while fixing the core mistake: treating “smart money” as a thing the system emits, instead of treating smart flow as a cautious, evidence-backed hypothesis with alternatives.
+
+The first implementation move should be the contract/naming PR: introduce `FlowHypothesisEvent` and `FlowEvidenceCluster` with compatibility aliases, then make replay the gate before touching more classifier logic.
--- a/docs/plans/synthetic-market-data-architecture-review.md
+++ b/docs/plans/synthetic-market-data-architecture-review.md
@ -0,0 +1,81 @@
+# Synthetic Market-Data Architecture Review
+
+## Summary
+- Target file: `docs/plans/synthetic-market-data-architecture-review.md`. No files were changed in this Plan Mode pass.
+- Recommendation: **Option B — Refactor**. Conservative work would trap determinism inside ingest adapters; full redesign is premature. Refactor makes synthetic generation first-class while keeping the useful NATS, ClickHouse, compute, API, and web stack.
+- Core direction: build a no-history, seeded, manifest-driven synthetic event engine with canonical real event types, separate labels/manifests, deterministic replay, fixture generation, load profiles, and demo scenarios.
+
+## Direct Answers
+1. Synthetic generation should be a **combination**: a reusable `@islandflow/synthetic-market` package, a CLI for fixture/run generation, replay-source integration, test fixture helpers, and demo presets. A service should be only a thin live/demo emitter.
+2. Synthetic events should map to existing canonical event types: `OptionPrint`, `OptionNBBO`, `EquityPrint`, and `EquityQuote`. Do not create parallel synthetic-only market event types for the main pipeline.
+3. Use **metadata plus isolation**, not permanent separate business schemas. Add provenance such as `source_kind`, `run_id`, `parameter_snapshot_hash`, and optional `scenario_id`; use run-scoped subjects/databases for tests and load runs when isolation matters.
+4. Ground-truth labels should be separate label records keyed by `run_id`, `scenario_id`, event IDs/trace IDs, expected class, expected direction, confidence band, required/forbidden evidence, and false-positive penalties. Do not expose hidden labels on emitted market events.
+5. Expected-output manifests should be versioned JSON/YAML artifacts produced by the CLI. They should pin seed bundle, generator version, parameter snapshot hash, generated event hashes, replay ordering, expected derived events, alert/no-alert expectations, and evidence requirements.
+6. Deterministic replay should consume either generated fixture files directly or materialized ClickHouse rows through the same replay ordering: event time, ingest time, seq, stable event ID. Replay should support a `synthetic` source/run selector.
+7. Tests should use synthetic data at three levels: pure package invariants, small golden manifests through compute batch logic, and optional infra-backed NATS/ClickHouse integration tests. `bun test` should not require Docker.
+8. Demos should use named demo runs/scenarios, not ambient live randomness. Keep the hosted synthetic control drawer for live demo tuning, but add deterministic demo run selection/replay.
+9. First-class domain objects: `SyntheticRun`, `SeedBundle`, `ParameterSnapshot`, `SymbolProfile`, `LiquidityProfile`, `VolatilityRegime`, `OptionChainProfile`, `ScenarioInjection`, `GroundTruthLabel`, `ExpectedOutputManifest`, `GeneratedEventBatch`, `ReplayPlan`, `LoadProfile`, and `DemoProfile`.
+10. Implementation details: PRNG algorithm internals, sampling formulas, placement heuristics, adapter timers, NATS consumer names, Redis rolling windows, ClickHouse loader mechanics, UI labels, and cache policy.
+
+## Area Classification
+- Existing replay architecture: **refactor**. Keep event-time merge and stream publishing; add generated-stream sources, run IDs, manifests, and deterministic output comparison.
+- Event schemas: **refactor**. Keep canonical raw/derived event shapes; add provenance metadata and separate label/manifest schemas.
+- Service boundaries: **refactor**. Move generator logic out of ingest adapters into a package; adapters become thin emitters.
+- Test structure: **redesign**. Current tests are unit-heavy and adapter-local; add fixture manifests, golden outputs, and batch replay checks.
+- ClickHouse fixture strategy: **refactor**. Keep storage helpers; add run-scoped fixture loaders and optional run metadata, not permanent synthetic clone tables.
+- NATS/JetStream: **keep/refactor**. Keep canonical subjects for production behavior; support isolated subject prefixes or disposable streams for tests/load.
+- Redis baseline interaction: **refactor**. Keep Redis for live rolling state; golden tests should use in-memory/resettable baselines.
+- UI/demo needs: **refactor**. Keep replay UI and synthetic admin rail; add named deterministic demo modes and scenario selectors.
+- CI feasibility: **keep/refactor**. Keep fast Bun CI; make synthetic package/golden tests infra-free and defer Docker integration to a separate job.
+
+## Option A — Conservative
+- Summary: wrap the current synthetic ingest adapters with minimal metadata, a small fixture CLI, and a few golden tests.
+- Pros: fastest, least migration, preserves current demos.
+- Cons: determinism remains mixed with wall-clock timers and live adapter behavior; labels/manifests stay bolted on.
+- Complexity: low to medium. Migration risk: low.
+- Better: quick smoke fixtures, basic provenance, modest replay demos.
+- Worse: long-term generator quality, test reliability, scenario authoring.
+- Kept: current ingest adapters, bus/storage/API/web mostly unchanged.
+- Rewritten: small parts of synthetic adapters and tests.
+- Deleted/deferred: deep replay refactor, new package boundary, batch harness.
+- PR sequence: add metadata schemas; add CLI wrapper; add fixture files; add basic replay filters; add initial golden tests.
+
+## Option B — Refactor
+- Summary: create `@islandflow/synthetic-market` as the deterministic engine; make adapters, CLI, replay, tests, and demos consume it.
+- Pros: deterministic by design, reusable, testable, demo-friendly, preserves the working stack.
+- Cons: more up-front movement; current adapter logic must be untangled.
+- Complexity: medium. Migration risk: medium-low.
+- Better: seeded runs, profiles, labels, manifests, replay, golden tests, load profiles.
+- Worse: short-term churn and some duplicated paths during migration.
+- Kept: canonical event schemas, NATS subjects, ClickHouse helpers, compute classifiers, API replay endpoints, web replay shell.
+- Rewritten: synthetic options/equities adapters, synthetic control state, replay source abstraction, tests around synthetic scenarios.
+- Deleted/deferred: adapter-local scenario catalog after migration; full LOB/agent/ML simulation.
+- PR sequence: add package and schemas; move current generators behind deterministic API; add CLI manifest generation; refactor adapters to consume package; add replay synthetic source/run filters; add golden fixture tests; add demo selector.
+
+## Option C — Redesign
+- Summary: rebuild around a unified deterministic event-log architecture where generation, replay, live demo, storage, and tests all consume run-partitioned event logs.
+- Pros: cleanest long-term model; excellent determinism, provenance, and replay semantics.
+- Cons: too much rebuild for pre-alpha; delays product learning.
+- Complexity: high. Migration risk: high.
+- Better: architecture purity, reproducible environments, run isolation.
+- Worse: delivery speed, disruption, operational risk.
+- Kept: some compute/classifier/domain logic and UI concepts.
+- Rewritten: replay, ingest, storage partitioning, bus topology, fixture/test harness.
+- Deleted/deferred: current synthetic adapters, current replay service shape, much of current live/demo plumbing.
+- PR sequence: define event log/envelope; implement generator; rebuild replay; rebuild storage materialization; port compute; port API/UI; retire old ingest paths.
+
+## Recommendation
+Choose **Option B**. Bluntly: Option A is a patch, and it will keep producing impressive-looking but untrustworthy demos. Option C is architecture vanity for a pre-alpha product. Option B is the grown-up move: extract the generator into a deterministic package, keep the useful event pipeline, and make replay/tests/demos consume the same generated runs.
+
+## Test Plan
+- Unit: PRNG determinism, profile normalization, tick validity, quote/trade invariants, option chain sparsity, label/manifest schema parsing.
+- Golden: fixed seed plus manifest produces byte/hash-stable raw events and stable smart-money/alert signatures.
+- Replay: synthetic source ordering matches manifest; derived outputs match expected-output manifest.
+- Integration: optional NATS/ClickHouse run-scoped fixture test behind a non-default CI job.
+- Demo/load: named demo profiles render in replay UI; load profile scales rates without changing event semantics.
+
+## Assumptions
+- MVP remains no-history-first.
+- Canonical real event schemas remain the pipeline contract.
+- Hidden labels are never embedded directly in market events.
+- Infra-backed tests are useful, but the first synthetic quality gate must pass in plain `bun test`.