Plan Document
Evidence-Backed Smart-Flow Detection
A readable architecture review for reshaping Islandflow's smart-flow system around direct observation, evidence clusters, cautious hypotheses, preserved uncertainty, and replayable validation.
Jump to
Summary
No source code was modified as part of the architecture review. The conclusion is direct: the current architecture is not suitable as-is, but it is close enough to refactor. The stack is right; the domain language and pipeline shape are not.
The research direction should be direct observation to inference to hypothesis, with preserved evidence and visible uncertainty. The system should stop emitting "smart money" as if it is a fact, and instead emit cautious, explainable smart-flow hypotheses.
Source Documents
Research Report
Architecture Review Copy
Area Classification
| Area | Call | Architecture Review |
|---|---|---|
| Domain model | refactor | Good bones, wrong center. Make evidence, hypotheses, scores, and alternatives first-class. |
| Event taxonomy | refactor | Raw/derived split is good; smart_money, dark.inferred, and classifier_hits leak overconfident product language. |
| Service boundaries | refactor | Ingest does too much signal policy; compute is too broad. Split pipeline stages before adding more intelligence. |
FlowPacket |
refactor | Keep concept, rename/reframe as FlowEvidenceCluster or FlowCandidate. Not a product domain object. |
SmartMoneyEvent |
redesign | Replace canonical object with FlowHypothesisEvent; use SmartFlowInsight only as UI/API projection. |
| Classifier pipeline | redesign | Current rules mix evidence extraction, hypothesis scoring, narrative labels, and alerting. Needs staged outputs. |
| ClickHouse/storage | refactor | Right datastore; raw tables are decent, derived evidence/hypotheses need typed/queryable columns plus JSON sidecars. |
| Redis baselines/cache | refactor | Right hot-state role; wrong as hidden baseline truth. Baselines need replayable snapshots/versioning. |
| NATS/JetStream subjects | refactor | Right bus; subjects should express stage/version: observations, evidence, hypotheses, insights. |
| Replay determinism | redesign | Present but not central enough. Replay must be the acceptance gate for derived outputs. |
| API/WebSocket | refactor | Mechanics are good; public surface should expose evidence bundles and hypotheses, not internal legacy names. |
| UI evidence model | refactor | Directionally good, but still foregrounds profile/probability over evidence quality, alternatives, and uncertainty. |
| Test strategy | redesign | Unit tests are solid scaffolding; needs fixture replay, false-positive suites, calibration, and end-to-end determinism. |
Direct Answers
- 01
Current suitability: no. Useful infrastructure, but not yet an evidence-backed smart-flow architecture.
- 02
SmartMoneyEvent: not a good canonical domain object. UseFlowHypothesisEvent.ParticipantHypothesisEventimplies participant identity too strongly.SmartFlowInsightshould be a user-facing projection. - 03
FlowPacket: not as named. Keep the abstraction as an internal evidence cluster, rename toFlowEvidenceClusterorFlowCandidate. - 04
Service boundaries: not right. Ingest should normalize only; evidence quality, eligibility, clustering, hypothesis scoring, and insight projection should be separate stages.
- 05
ClickHouse/Redis/NATS roles: yes broadly. ClickHouse is the authoritative event/audit store. Redis is hot cache only. NATS is transport, not truth. All three need cleaner contracts.
- 06
Replay central enough: no. It should be how every detection change proves itself.
- 07
UI uncertainty: partially. It shows evidence refs, profile ladders, abstention, and suppression, but needs confidence vs conviction, alternative explanations, evidence quality, and why-not signals.
- 08
First-class domain objects: raw observations, execution context, quote join, eligibility decision, evidence cluster, structure hypothesis, evidence quality score, baseline snapshot, hypothesis score vector, false-positive penalty, catalyst context, flow hypothesis event, smart-flow insight, replay run.
- 09
Implementation details: Redis list layout, durable consumer names, current classifier thresholds, ClickHouse batch writer, adapter internals, legacy
ClassifierHitEvent, alert severity math, UI cache mechanics. - 10
Delete/defer: canonical smart-money naming, real-time dark-pool certainty, standalone whale-premium alerts, trade-level open/close claims, participant identity claims, simplistic premium alert score, ingest-time signal filtering,
retail_whaleas a canonical profile unless reframed as attention/lottery flow.
Objects to Make First-Class
Options
Conservative
Keep current objects and services; add evidence-quality fields, UI copy fixes, and replay tests.
- Pros
Fastest, lowest migration risk, preserves current endpoints and UI.
- Cons
Leaves misleading canonical names and keeps inference tangled in compute.
- Complexity
Low.
- Migration Risk
Low.
- Rename UI copy from smart money to smart flow candidate.
- Add evidence-quality and alternative-explanation fields to existing event.
- Add replay consistency tests around current outputs.
- Add typed ClickHouse columns for high-value JSON fields.
- Deprecate, but do not remove, legacy classifier hit display.
Refactor
Keep the stack and terminal UI, but rebuild the domain pipeline around evidence clusters and hypothesis events.
- Pros
Fixes the product's epistemic spine without wasting useful infrastructure.
- Cons
Requires breaking contract migration across types, storage, compute, API, UI, and tests.
- Complexity
Medium-high.
- Migration Risk
Medium.
- Introduce
FlowEvidenceCluster,FlowHypothesisEvent,SmartFlowInsight,EvidenceQuality, and version fields with compatibility aliases. - Move signal eligibility out of ingest.
- Split compute into evidence join, cluster/structure, hypothesis scoring, and insight/alert projection.
- Replace derived JSON-only storage with typed query columns.
- Add replay-run harness that recomputes derived outputs from raw streams.
- Add
/flow/evidence,/flow/hypotheses,/flow/insights, and WS equivalents. - Rework UI drawers/tables around evidence quality, confidence vs conviction, alternatives, abstention, and catalyst/noise context.
- Add fixture suites for stale quotes, complex spreads, 0DTE/event noise, deep ITM, wide spreads, and off-exchange ambiguity.
Redesign
Start over with an event-sourced evidence engine and versioned, replayable policies.
- Pros
Cleanest long-term architecture and strongest research discipline.
- Cons
Slowest, overkill before product fit, and discards too much working infrastructure.
- Complexity
Very high.
- Migration Risk
High.
- Define new canonical event taxonomy and versioned policy registry.
- Build raw observation lake and deterministic replay runner first.
- Build evidence extraction and quote/condition eligibility services.
- Build cluster and structure hypothesis services.
- Build hypothesis scoring and calibration services.
- Build insight projection API.
- Rebuild terminal against new evidence/hypothesis contracts.
- Backfill or discard old derived data.
Recommendation
Choose Option B.
Option A is too timid for a pre-alpha product whose current names already fight the research. Option C is intellectually clean but wastes too much working infrastructure. Option B keeps the stack and terminal momentum while fixing the core mistake: treating smart money as a thing the system emits, instead of treating smart flow as a cautious, evidence-backed hypothesis with alternatives.
The first implementation move should be the contract/naming PR: introduce
FlowHypothesisEvent and FlowEvidenceCluster with compatibility aliases,
then make replay the gate before touching more classifier logic.