Synthetic Market Data
+Deterministic generation, fixtures, scenarios, replay integration, demos, and future calibration.
+ +diff --git a/docs/implementation/index.html b/docs/implementation/index.html new file mode 100644 index 0000000..bfb9380 --- /dev/null +++ b/docs/implementation/index.html @@ -0,0 +1,543 @@ + + +
+ + +Implementation Map
++ The active planning layer for synthetic market-data and smart-money/smart-flow architecture work. + Architecture reviews and research reports are background; phase documents and Beads issues define execution scope. +
+ +Jump to
+ +docs/implementation/docs/plans/docs/research-docs/
+ This repository uses docs/research-docs/ for research reports; docs/research/
+ is not present. Research reports provide rationale and useful constraints, but they do not add active
+ implementation scope unless that scope is explicitly pulled into a phase document and Beads issue.
+
Deterministic generation, fixtures, scenarios, replay integration, demos, and future calibration.
+ +Contracts, evidence clustering, hypothesis scoring, replay evaluation, explainability, and calibration.
+ +| Stream | +Epic | +Roadmap | +
|---|---|---|
| Synthetic market data | +islandflow-259 - Plan synthetic market-data implementation phases |
+ docs/implementation/synthetic-market-data/00-roadmap.html | +
| Smart money / smart flow | +islandflow-zxh - Plan smart-money to smart-flow implementation phases |
+ docs/implementation/smart-money/00-roadmap.html | +
| Order | +Phase | +Beads issue | +Blocks next because | +
|---|---|---|---|
| 1A | Synthetic deterministic spine | islandflow-259.1 | Establishes seeded raw event generation and provenance assumptions for later synthetic work. |
| 1B | Smart-flow contracts and vocabulary | islandflow-zxh.1 | Can safely run in parallel with synthetic phase 01; defines evidence/hypothesis language before scoring work. |
| 2 | Synthetic manifests, fixtures, and CLI | islandflow-259.2 | Evidence clustering needs deterministic fixtures before broad behavior changes. |
| 3 | Smart-flow evidence clustering and features | islandflow-zxh.2 | Scenario labels need the evidence vocabulary they are expected to exercise. |
| 4 | Synthetic scenarios, labels, and expected outputs | islandflow-259.3 | Hypothesis scoring needs labeled positive, negative, and abstention cases. |
| 5 | Smart-flow hypothesis scoring and abstention | islandflow-zxh.3 | Synthetic replay integration should validate the derived hypothesis pipeline. |
| 6 | Synthetic replay integration | islandflow-259.4 | Smart-flow golden tests need replayable synthetic runs. |
| 7 | Smart-flow replay evaluation and golden tests | islandflow-zxh.4 | Demos should wait until replay proves the semantics. |
| 8 | Synthetic demo and load profiles | islandflow-259.5 | API/UI explainability should show stable, named, deterministic runs. |
| 9 | Smart-flow API/UI explainability | islandflow-zxh.5 | Final MVP presentation layer after the evidence pipeline is validated. |
islandflow-259.6 depends on synthetic phase 05, but is not required for MVP.
islandflow-zxh.6 depends on smart-flow phase 05 and synthetic future calibration, but is not required for MVP.
Smart Flow Roadmap
++ Implementation-sized phases for turning smart-money detection into smart-flow inference: + observations, evidence clusters, cautious hypotheses, confidence, alternatives, abstention, + replay evaluation, and user-facing insight projections. +
+ +Jump to
+ +Use these while planning or implementing smart-flow phase work.
+ +These explain rationale, but do not add scope unless pulled into a phase doc and Beads issue.
+ +| Phase | +Beads issue | +Depends on | +Purpose | +
|---|---|---|---|
| 01 - Contracts and vocabulary | islandflow-zxh.1 | None; safe parallel with islandflow-259.1 | Define evidence/hypothesis/insight contracts and retire canonical overconfidence. |
| 02 - Evidence clustering and features | islandflow-zxh.2 | islandflow-259.2 | Extract eligibility, evidence facts, clusters, and traceable features. |
| 03 - Hypothesis scoring and abstention | islandflow-zxh.3 | islandflow-259.3 | Score cautious hypotheses and represent abstention/alternatives. |
| 04 - Replay evaluation and golden tests | islandflow-zxh.4 | islandflow-259.4 | Validate derived outputs through deterministic replay and golden fixtures. |
| 05 - API/UI explainability | islandflow-zxh.5 | islandflow-259.5 | Expose evidence-backed insights and uncertainty to API, WS, and UI. |
| 99 - Future calibration | islandflow-zxh.6 | islandflow-zxh.5, islandflow-259.6 | Calibrate confidence and policy behavior later with richer datasets. |
islandflow-zxh.2.1 - Eligibility and evidence factsSplit out the direct fact and eligibility layer before clustering and feature vector work.
+islandflow-zxh.2.2 - Clustering and feature vectorsKeep clustering and feature vector changes reviewable after the evidence vocabulary exists.
+islandflow-zxh.3.1 - Hypothesis score vectorsBuild scoring as a separate semantic layer, not as UI-ready certainty.
+islandflow-zxh.3.2 - Abstention and insight projectionRepresent alternatives, penalties, and abstention before exposing user-facing insight projections.
+islandflow-zxh.5.1 - Evidence API and websocket surfacesExpose evidence-backed contracts through transport before tuning the presentation layer.
+islandflow-zxh.5.2 - UI explainability surfacesShow evidence quality, confidence vs conviction, alternatives, abstention, and catalyst/noise context.
++ If an implementation PR crosses contracts, compute, storage, API, and UI in one change, stop and split it. +
+islandflow-zxh - Plan smart-money to smart-flow implementation phases.
Synthetic Roadmap
++ Implementation-sized phases for extracting deterministic synthetic generation into a first-class reusable engine + while keeping the useful NATS, ClickHouse, compute, API, replay, and web stack. +
+ +Jump to
+ +Use these while planning or implementing synthetic market-data phase work.
+ +These explain rationale, but do not add scope unless pulled into a phase doc and Beads issue.
+ +OptionPrint, OptionNBBO, EquityPrint, and EquityQuote.bun test should not require Docker, ClickHouse, NATS, or Redis.| Phase | +Beads issue | +Depends on | +Purpose | +
|---|---|---|---|
| 01 - Deterministic spine | islandflow-259.1 | None | Create the seeded generation foundation and canonical event output contract. |
| 02 - Manifests, fixtures, CLI | islandflow-259.2 | islandflow-zxh.1 | Turn deterministic generation into durable fixtures and manifests. |
| 03 - Scenarios, labels, expected outputs | islandflow-259.3 | islandflow-zxh.2 | Author named scenarios, separate labels, and expected derived outputs. |
| 04 - Replay integration | islandflow-259.4 | islandflow-zxh.3 | Make replay consume synthetic runs with stable ordering and output comparison. |
| 05 - Demo and load profiles | islandflow-259.5 | islandflow-zxh.4 | Expose named deterministic demo/load profiles after replay validation. |
| 99 - Future historical calibration | islandflow-259.6 | islandflow-259.5 | Calibrate parameters from historical data later, after the MVP is stable. |
islandflow-259.3.1 - Scenario catalog and labelsKeep scenario authoring and ground-truth label shape focused before expected-output comparison grows around it.
+islandflow-259.3.2 - Expected-output manifestsStore expected derived outputs as reviewable artifacts for downstream smart-flow validation.
++ If any other phase starts touching unrelated service, API, UI, and storage behavior in one PR, split it before implementation continues. +
+islandflow-259 - Plan synthetic market-data implementation phases.
Plan Document
++ A readable architecture review for reshaping Islandflow's smart-flow system around direct observation, + evidence clusters, cautious hypotheses, preserved uncertainty, and replayable validation. +
+ +Jump to
+ ++ No source code was modified as part of the architecture review. The conclusion is direct: + the current architecture is not suitable as-is, but it is close enough to refactor. + The stack is right; the domain language and pipeline shape are not. +
++ The research direction should be direct observation to inference to hypothesis, with preserved + evidence and visible uncertainty. The system should stop emitting "smart money" as if it is a + fact, and instead emit cautious, explainable smart-flow hypotheses. +
+| Area | +Call | +Architecture Review | +
|---|---|---|
| Domain model | +refactor | +Good bones, wrong center. Make evidence, hypotheses, scores, and alternatives first-class. | +
| Event taxonomy | +refactor | +Raw/derived split is good; smart_money, dark.inferred, and classifier_hits leak overconfident product language. |
+
| Service boundaries | +refactor | +Ingest does too much signal policy; compute is too broad. Split pipeline stages before adding more intelligence. | +
FlowPacket |
+ refactor | +Keep concept, rename/reframe as FlowEvidenceCluster or FlowCandidate. Not a product domain object. |
+
SmartMoneyEvent |
+ redesign | +Replace canonical object with FlowHypothesisEvent; use SmartFlowInsight only as UI/API projection. |
+
| Classifier pipeline | +redesign | +Current rules mix evidence extraction, hypothesis scoring, narrative labels, and alerting. Needs staged outputs. | +
| ClickHouse/storage | +refactor | +Right datastore; raw tables are decent, derived evidence/hypotheses need typed/queryable columns plus JSON sidecars. | +
| Redis baselines/cache | +refactor | +Right hot-state role; wrong as hidden baseline truth. Baselines need replayable snapshots/versioning. | +
| NATS/JetStream subjects | +refactor | +Right bus; subjects should express stage/version: observations, evidence, hypotheses, insights. | +
| Replay determinism | +redesign | +Present but not central enough. Replay must be the acceptance gate for derived outputs. | +
| API/WebSocket | +refactor | +Mechanics are good; public surface should expose evidence bundles and hypotheses, not internal legacy names. | +
| UI evidence model | +refactor | +Directionally good, but still foregrounds profile/probability over evidence quality, alternatives, and uncertainty. | +
| Test strategy | +redesign | +Unit tests are solid scaffolding; needs fixture replay, false-positive suites, calibration, and end-to-end determinism. | +
Current suitability: no. Useful infrastructure, but not yet an evidence-backed smart-flow architecture.
SmartMoneyEvent: not a good canonical domain object. Use FlowHypothesisEvent. ParticipantHypothesisEvent implies participant identity too strongly. SmartFlowInsight should be a user-facing projection.
FlowPacket: not as named. Keep the abstraction as an internal evidence cluster, rename to FlowEvidenceCluster or FlowCandidate.
Service boundaries: not right. Ingest should normalize only; evidence quality, eligibility, clustering, hypothesis scoring, and insight projection should be separate stages.
ClickHouse/Redis/NATS roles: yes broadly. ClickHouse is the authoritative event/audit store. Redis is hot cache only. NATS is transport, not truth. All three need cleaner contracts.
Replay central enough: no. It should be how every detection change proves itself.
UI uncertainty: partially. It shows evidence refs, profile ladders, abstention, and suppression, but needs confidence vs conviction, alternative explanations, evidence quality, and why-not signals.
First-class domain objects: raw observations, execution context, quote join, eligibility decision, evidence cluster, structure hypothesis, evidence quality score, baseline snapshot, hypothesis score vector, false-positive penalty, catalyst context, flow hypothesis event, smart-flow insight, replay run.
Implementation details: Redis list layout, durable consumer names, current classifier thresholds, ClickHouse batch writer, adapter internals, legacy ClassifierHitEvent, alert severity math, UI cache mechanics.
Delete/defer: canonical smart-money naming, real-time dark-pool certainty, standalone whale-premium alerts, trade-level open/close claims, participant identity claims, simplistic premium alert score, ingest-time signal filtering, retail_whale as a canonical profile unless reframed as attention/lottery flow.
Keep current objects and services; add evidence-quality fields, UI copy fixes, and replay tests.
+Fastest, lowest migration risk, preserves current endpoints and UI.
Leaves misleading canonical names and keeps inference tangled in compute.
Low.
Low.
Keep the stack and terminal UI, but rebuild the domain pipeline around evidence clusters and hypothesis events.
+Fixes the product's epistemic spine without wasting useful infrastructure.
Requires breaking contract migration across types, storage, compute, API, UI, and tests.
Medium-high.
Medium.
FlowEvidenceCluster, FlowHypothesisEvent, SmartFlowInsight, EvidenceQuality, and version fields with compatibility aliases./flow/evidence, /flow/hypotheses, /flow/insights, and WS equivalents.Start over with an event-sourced evidence engine and versioned, replayable policies.
+Cleanest long-term architecture and strongest research discipline.
Slowest, overkill before product fit, and discards too much working infrastructure.
Very high.
High.
Choose Option B.
++ Option A is too timid for a pre-alpha product whose current names already fight the research. + Option C is intellectually clean but wastes too much working infrastructure. Option B keeps the + stack and terminal momentum while fixing the core mistake: treating smart money as a thing the + system emits, instead of treating smart flow as a cautious, evidence-backed hypothesis with alternatives. +
+
+ The first implementation move should be the contract/naming PR: introduce
+ FlowHypothesisEvent and FlowEvidenceCluster with compatibility aliases,
+ then make replay the gate before touching more classifier logic.
+