islandflow/docs/research-docs/smart-flow-architecture-review.md
dirtydishes 5cd19bd1e7
All checks were successful
CI / Validate (push) Successful in 1m16s
add smart flow research notes
2026-06-16 13:20:42 -04:00

135 lines
10 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Architecture Review: Evidence-Backed Smart-Flow Detection
## Summary
No source code was modified. The current architecture is **not suitable as-is**, but it is **close enough to refactor, not rewrite**. The stack is right; the domain language and pipeline shape are not.
Research direction: direct observation → inference → hypothesis, with preserved evidence and visible uncertainty. See [smart-flow-market-mechanics.md](/Users/kell/dev/islandflow/docs/research-docs/smart-flow-market-mechanics.md:7).
Key code evidence: `FlowPacket` is a generic feature bag in [events.ts](/Users/kell/dev/islandflow/packages/types/src/events.ts:193), `SmartMoneyEvent` already has useful score/abstention fields in [events.ts](/Users/kell/dev/islandflow/packages/types/src/events.ts:283), compute emits smart-money events then compatibility hits/alerts in [index.ts](/Users/kell/dev/islandflow/services/compute/src/index.ts:1086), storage keeps core hypothesis detail as JSON in [smart-money-events.ts](/Users/kell/dev/islandflow/packages/storage/src/smart-money-events.ts:24), and replay currently replays raw market streams rather than validating the whole derived pipeline in [replay/index.ts](/Users/kell/dev/islandflow/services/replay/src/index.ts:69).
## Area Classification
| Area | Call | Architecture Review |
|---|---:|---|
| Domain model | **refactor** | Good bones, wrong center. Make evidence, hypotheses, scores, and alternatives first-class. |
| Event taxonomy | **refactor** | Raw/derived split is good; `smart_money`, `dark.inferred`, and `classifier_hits` leak overconfident product language. |
| Service boundaries | **refactor** | Ingest does too much signal policy; compute is too broad. Split pipeline stages before adding more intelligence. |
| `FlowPacket` | **refactor** | Keep concept, rename/reframe as `FlowEvidenceCluster` or `FlowCandidate`. Not a product domain object. |
| `SmartMoneyEvent` | **redesign** | Replace canonical object with `FlowHypothesisEvent`; use `SmartFlowInsight` only as UI/API projection. |
| Classifier pipeline | **redesign** | Current rules mix evidence extraction, hypothesis scoring, narrative labels, and alerting. Needs staged outputs. |
| ClickHouse/storage | **refactor** | Right datastore; raw tables are decent, derived evidence/hypotheses need typed/queryable columns plus JSON sidecars. |
| Redis baselines/cache | **refactor** | Right hot-state role; wrong as hidden baseline truth. Baselines need replayable snapshots/versioning. |
| NATS/JetStream subjects | **refactor** | Right bus; subjects should express stage/version: observations, evidence, hypotheses, insights. |
| Replay determinism | **redesign** | Present but not central enough. Replay must be the acceptance gate for derived outputs. |
| API/WebSocket | **refactor** | Mechanics are good; public surface should expose evidence bundles and hypotheses, not internal legacy names. |
| UI evidence model | **refactor** | Directionally good, but still foregrounds “profile/probability” over evidence quality, alternatives, and uncertainty. |
| Test strategy | **redesign** | Unit tests are solid scaffolding; needs fixture replay, false-positive suites, calibration, and end-to-end determinism. |
## Direct Answers
1. **Current suitability:** no. Useful infrastructure, but not yet an evidence-backed smart-flow architecture.
2. **`SmartMoneyEvent`:** not a good canonical domain object. Use **`FlowHypothesisEvent`**. `ParticipantHypothesisEvent` implies participant identity too strongly. `SmartFlowInsight` should be a user-facing projection.
3. **`FlowPacket`:** not as named. Keep the abstraction as an internal evidence cluster, rename to `FlowEvidenceCluster` or `FlowCandidate`.
4. **Service boundaries:** not right. Ingest should normalize only; evidence quality, eligibility, clustering, hypothesis scoring, and insight projection should be separate stages.
5. **ClickHouse/Redis/NATS roles:** yes broadly. ClickHouse = authoritative event/audit store. Redis = hot cache only. NATS = transport, not truth. All three need cleaner contracts.
6. **Replay central enough:** no. It should be how every detection change proves itself.
7. **UI uncertainty:** partially. It shows evidence refs, profile ladders, abstention, and suppression, but needs confidence vs conviction, alternative explanations, evidence quality, and “why not” signals.
8. **First-class domain objects:** raw observations, execution context, quote join, eligibility decision, evidence cluster, structure hypothesis, evidence quality score, baseline snapshot, hypothesis score vector, false-positive penalty, catalyst context, flow hypothesis event, smart-flow insight, replay run.
9. **Implementation details:** Redis list layout, durable consumer names, current classifier thresholds, ClickHouse batch writer, adapter internals, legacy `ClassifierHitEvent`, alert severity math, UI cache mechanics.
10. **Delete/defer:** canonical “smart money” naming, real-time dark-pool certainty, standalone whale-premium alerts, trade-level open/close claims, participant identity claims, simplistic premium alert score, ingest-time signal filtering, `retail_whale` as a canonical profile unless reframed as attention/lottery flow.
## Option A — Conservative
Summary: keep current objects and services; add evidence-quality fields, UI copy fixes, and replay tests.
Pros: fastest, lowest migration risk, preserves current endpoints and UI.
Cons: leaves misleading canonical names; makes future research harder; keeps inference tangled inside current compute flow.
Complexity: low. Migration risk: low.
Better: less overconfidence, more visible suppression, quicker validation.
Worse: domain debt remains; `SmartMoneyEvent` becomes harder to undo later.
Likely kept: most code in `services/compute`, `packages/types`, `packages/storage`, API routes, UI panes.
Likely rewritten: alert scoring, UI labels, some profile fields.
Likely deleted: almost nothing.
PR sequence:
1. Rename UI copy from “Smart money” to “Smart flow candidate.”
2. Add evidence-quality and alternative-explanation fields to existing event.
3. Add replay consistency tests around current outputs.
4. Add typed ClickHouse columns for high-value JSON fields.
5. Deprecate, but do not remove, legacy classifier hit display.
## Option B — Refactor
Summary: keep Bun/TS, NATS, ClickHouse, Redis, API/WS, and the terminal UI, but rebuild the domain pipeline around evidence clusters and hypothesis events.
Pros: fixes the products epistemic spine without wasting useful infrastructure; best fit for pre-alpha.
Cons: breaking contract migration; touches types, storage, compute, API, UI, and tests.
Complexity: medium-high. Migration risk: medium.
Better: replayability, auditability, naming, evidence display, calibration, and future research velocity.
Worse: more short-term churn; old demos and endpoints need compatibility aliases.
Likely kept: raw market schemas, adapters, NATS/ClickHouse/Redis clients, live socket mechanics, virtualized UI, replay service skeleton, many feature calculations.
Likely rewritten: `SmartMoneyEvent`, `FlowPacket`, classifier pipeline, alert projection, ClickHouse derived schemas, API channel names, UI evidence drawers.
Likely deleted: canonical `smart_money` naming, ingest signal policy, premium-heavy alert scoring, `ClassifierHitEvent` as primary domain surface.
PR sequence:
1. Introduce `FlowEvidenceCluster`, `FlowHypothesisEvent`, `SmartFlowInsight`, `EvidenceQuality`, and version fields; keep aliases for compatibility.
2. Move signal eligibility out of ingest; ingest publishes normalized observations plus execution context only.
3. Split compute internally into evidence join → cluster/structure → hypothesis scoring → insight/alert projection.
4. Replace derived JSON-only storage with typed query columns for evidence quality, hypothesis scores, model version, policy version, and refs.
5. Add replay-run harness that recomputes derived outputs from raw streams and compares signatures.
6. Add `/flow/evidence`, `/flow/hypotheses`, `/flow/insights` plus WS equivalents; keep legacy endpoints as aliases.
7. Rework UI drawers/tables around evidence quality, confidence vs conviction, alternatives, abstention, and catalyst/noise context.
8. Add fixture suites for stale quotes, complex spreads, 0DTE/event noise, deep ITM, wide spreads, and off-exchange ambiguity.
## Option C — Redesign
Summary: if starting over, build an event-sourced evidence engine with raw observations as the only source of truth and every derived artifact generated by versioned, replayable policies.
Pros: cleanest long-term architecture; strongest research discipline; easiest calibration/backtesting story.
Cons: slowest; overkill before product fit; discards too much working terminal and streaming infrastructure.
Complexity: very high. Migration risk: high.
Better: clean contracts, model versioning, deterministic replay, research-grade evidence lineage.
Worse: delivery speed, continuity, and working UI velocity.
Likely kept: market adapters, some schemas, ClickHouse client, NATS helpers, UI visual direction, selected tests.
Likely rewritten: almost all compute, storage schemas, API contracts, replay, UI data model.
Likely deleted: `FlowPacket`, `SmartMoneyEvent`, `ClassifierHitEvent`, `AlertEvent` as currently shaped, current subject hierarchy, current derived tables.
PR sequence:
1. Define new canonical event taxonomy and versioned policy registry.
2. Build raw observation lake and deterministic replay runner first.
3. Build evidence extraction and quote/condition eligibility services.
4. Build cluster and structure hypothesis services.
5. Build hypothesis scoring and calibration services.
6. Build insight projection API.
7. Rebuild terminal against new evidence/hypothesis contracts.
8. Backfill or discard old derived data.
## Recommendation
Choose **Option B**.
Bluntly: Option A is too timid for a pre-alpha product whose current names already fight the research. Option C is intellectually clean but wastes too much working infrastructure. Option B keeps the stack and terminal momentum while fixing the core mistake: treating “smart money” as a thing the system emits, instead of treating smart flow as a cautious, evidence-backed hypothesis with alternatives.
The first implementation move should be the contract/naming PR: introduce `FlowHypothesisEvent` and `FlowEvidenceCluster` with compatibility aliases, then make replay the gate before touching more classifier logic.