plan synthetic and smart-flow phases

2026-06-16 13:46:08 -04:00 · 2026-06-16 13:46:08 -04:00 · eaa22de302
commit eaa22de302
parent d1fac6c7ec
19 changed files with 1198 additions and 1 deletions
--- a/docs/implementation/README.md
+++ b/docs/implementation/README.md
@ -0,0 +1,58 @@
+# Implementation Phase Plans
+
+This directory is the active planning layer for the synthetic market-data and smart-money/smart-flow architecture work.
+
+The architecture reviews in `docs/plans/` are background guidance. Future implementation work should use the current phase document and matching Beads issue as the active scope. If a phase document and an older architecture review disagree, pause and update the phase document or Beads issue before writing code.
+
+## Source Plans
+
+- `docs/plans/synthetic-market-data-architecture-review.md`
+- `docs/plans/smart-flow-architecture-review.md`
+
+## Planning Rules
+
+- Prefer small, reviewable PRs.
+- Do not implement an entire architecture plan at once.
+- Use Beads issues for execution tracking and dependency management.
+- Keep durable architecture and phase detail in these docs, not in long Beads descriptions.
+- Synthetic data must emit canonical market event types, not synthetic-only pipeline event types.
+- Synthetic labels must remain separate from emitted market events.
+- Smart-flow logic must distinguish facts, evidence, hypotheses, confidence, and abstention.
+- Historical calibration is future work, not an MVP dependency.
+- Early synthetic tests must not require Docker, ClickHouse, NATS, or Redis.
+- Synthetic foundations should come before demos, UI controls, or live service work.
+
+## Beads Map
+
+| Stream | Epic | Roadmap |
+| --- | --- | --- |
+| Synthetic market data | `islandflow-259` - Plan synthetic market-data implementation phases | `docs/implementation/synthetic-market-data/00-roadmap.md` |
+| Smart money / smart flow | `islandflow-zxh` - Plan smart-money to smart-flow implementation phases | `docs/implementation/smart-money/00-roadmap.md` |
+
+## Dependency Order
+
+This is the intended MVP ordering. Future calibration phases sit after the MVP chain and should not block it.
+
+| Order | Phase | Beads issue | Blocks next because |
+| ---: | --- | --- | --- |
+| 1 | Synthetic deterministic spine | `islandflow-259.1` | The smart-flow vocabulary needs stable raw event/provenance assumptions. |
+| 2 | Smart-flow contracts and vocabulary | `islandflow-zxh.1` | Synthetic manifests should target the eventual evidence/hypothesis language. |
+| 3 | Synthetic manifests, fixtures, and CLI | `islandflow-259.2` | Evidence clustering needs deterministic fixtures before broad behavior changes. |
+| 4 | Smart-flow evidence clustering and features | `islandflow-zxh.2` | Scenario labels need the evidence vocabulary they are expected to exercise. |
+| 5 | Synthetic scenarios, labels, and expected outputs | `islandflow-259.3` | Hypothesis scoring needs labeled positive, negative, and abstention cases. |
+| 6 | Smart-flow hypothesis scoring and abstention | `islandflow-zxh.3` | Synthetic replay integration should validate the derived hypothesis pipeline. |
+| 7 | Synthetic replay integration | `islandflow-259.4` | Smart-flow golden tests need replayable synthetic runs. |
+| 8 | Smart-flow replay evaluation and golden tests | `islandflow-zxh.4` | Demos should wait until replay proves the semantics. |
+| 9 | Synthetic demo and load profiles | `islandflow-259.5` | API/UI explainability should show stable, named, deterministic runs. |
+| 10 | Smart-flow API/UI explainability | `islandflow-zxh.5` | This is the final MVP presentation layer after the evidence pipeline is validated. |
+
+## Future Work
+
+| Future phase | Beads issue | Notes |
+| --- | --- | --- |
+| Synthetic historical calibration | `islandflow-259.6` | Depends on synthetic phase 05, but is not required for MVP. |
+| Smart-flow calibration | `islandflow-zxh.6` | Depends on smart-flow phase 05 and synthetic future calibration, but is not required for MVP. |
+
+## Existing Related Issue
+
+`islandflow-9dz` already tracks tuning synthetic smart-money scenario coverage. It is narrower than these phase plans and was already in progress before this split. Treat it as related context for `docs/implementation/synthetic-market-data/03-scenarios-labels-expected-outputs.md`, not as the phase-level tracker.
--- a/docs/implementation/smart-money/00-roadmap.md
+++ b/docs/implementation/smart-money/00-roadmap.md
@ -0,0 +1,40 @@
+# Smart Money / Smart Flow Roadmap
+
+This roadmap breaks `docs/plans/smart-flow-architecture-review.md` into implementation-sized phases. The recommended direction is Option B: keep the working stack, but rebuild the domain pipeline around observations, evidence clusters, cautious hypotheses, confidence, alternatives, abstention, replay evaluation, and user-facing insight projections.
+
+## Core Constraints
+
+- Do not treat "smart money" as a canonical fact emitted by the system.
+- Distinguish direct facts, evidence, hypotheses, confidence, alternatives, and abstention.
+- Preserve evidence and uncertainty in storage, API, websocket, and UI surfaces.
+- Keep Redis as hot cache only, not hidden baseline truth.
+- Make replay evaluation the acceptance gate before expanding UI confidence.
+- Keep historical or research-grade calibration as future work, not an MVP dependency.
+
+## Phase Sequence
+
+| Phase | Beads issue | Depends on | Purpose |
+| --- | --- | --- | --- |
+| 01 - Contracts and vocabulary | `islandflow-zxh.1` | `islandflow-259.1` | Define evidence/hypothesis/insight contracts and retire canonical overconfidence. |
+| 02 - Evidence clustering and features | `islandflow-zxh.2` | `islandflow-259.2` | Extract eligibility, evidence facts, clusters, and traceable features. |
+| 03 - Hypothesis scoring and abstention | `islandflow-zxh.3` | `islandflow-259.3` | Score cautious hypotheses and represent abstention/alternatives. |
+| 04 - Replay evaluation and golden tests | `islandflow-zxh.4` | `islandflow-259.4` | Validate derived outputs through deterministic replay and golden fixtures. |
+| 05 - API/UI explainability | `islandflow-zxh.5` | `islandflow-259.5` | Expose evidence-backed insights and uncertainty to API, WS, and UI. |
+| 99 - Future calibration | `islandflow-zxh.6` | `islandflow-zxh.5`, `islandflow-259.6` | Calibrate confidence and policy behavior later with richer datasets. |
+
+## PR Split Notes
+
+Several phases are broad enough to split before implementation:
+
+- `islandflow-zxh.2.1` - Split smart-flow phase 02a: eligibility and evidence facts
+- `islandflow-zxh.2.2` - Split smart-flow phase 02b: clustering and feature vectors
+- `islandflow-zxh.3.1` - Split smart-flow phase 03a: hypothesis score vectors
+- `islandflow-zxh.3.2` - Split smart-flow phase 03b: abstention and insight projection
+- `islandflow-zxh.5.1` - Split smart-flow phase 05a: evidence API and websocket surfaces
+- `islandflow-zxh.5.2` - Split smart-flow phase 05b: UI explainability surfaces
+
+If an implementation PR crosses contracts, compute, storage, API, and UI in one change, stop and split it.
+
+## Matching Beads Epic
+
+- `islandflow-zxh` - Plan smart-money to smart-flow implementation phases
--- a/docs/implementation/smart-money/01-contracts-vocabulary.md
+++ b/docs/implementation/smart-money/01-contracts-vocabulary.md
@ -0,0 +1,66 @@
+# Smart-Flow Phase 01: Contracts and Vocabulary
+
+## Purpose
+
+Introduce the domain vocabulary and contracts that distinguish observations, evidence clusters, hypotheses, confidence, abstention, and user-facing insight projections.
+
+## Why this phase comes now
+
+The current system has useful infrastructure but overconfident domain names. Before changing classifier behavior, the codebase needs the language to express what is observed, what is inferred, what is uncertain, and when the system should abstain.
+
+## Dependencies on earlier phases
+
+- `islandflow-259.1` - Synthetic deterministic spine, so contract work can align with canonical raw event and provenance assumptions.
+
+## Likely files/modules touched
+
+- `packages/types/src/events.ts`
+- Shared type exports in `packages/types/`
+- Compatibility type aliases where legacy names are still needed
+- Storage schema planning docs or migration notes
+- Tests for schema parsing or event compatibility
+
+## In-scope work
+
+- Define or prepare contracts for `FlowEvidenceCluster`, `FlowCandidate`, `FlowHypothesisEvent`, `SmartFlowInsight`, `EvidenceQuality`, `BaselineSnapshot`, and version fields.
+- Mark legacy "smart money" naming as compatibility or projection language, not canonical truth.
+- Define how facts, evidence, hypotheses, scores, confidence, and abstention differ.
+- Preserve compatibility aliases for existing API/UI paths where necessary.
+- Add concise migration notes for future phases.
+
+## Explicitly out-of-scope work
+
+- Rewriting classifier scoring.
+- Moving ingest policy.
+- Adding new API endpoints or UI drawers.
+- Building replay golden suites.
+- Historical calibration or research-grade model fitting.
+
+## Acceptance criteria
+
+- Contracts distinguish observations, evidence, hypotheses, insight projections, confidence, alternatives, and abstention.
+- Legacy naming remains only where compatibility requires it.
+- Version fields are included for policy/model evolution.
+- Future phases can refer to these contracts without redefining the vocabulary.
+- Migration risk and compatibility aliases are documented.
+
+## Test strategy
+
+Use type-level checks and schema/serialization tests where practical. Add compatibility tests only for public contracts that must remain stable. Avoid broad behavior tests until evidence extraction and scoring phases exist.
+
+## Risks / design traps
+
+- Renaming everything without compatibility will break consumers.
+- Keeping "smart money" as canonical language will preserve the old overconfidence.
+- Mixing facts and hypotheses in one event shape will make replay evaluation weaker.
+- Adding too many future fields can make contracts noisy before behavior exists.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/smart-money/01-contracts-vocabulary.md for Beads issue islandflow-zxh.1. Focus on contracts, vocabulary, version fields, and compatibility aliases only. Do not rewrite scoring, API/UI explainability, replay tests, or calibration.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-zxh.1` - Smart-flow phase 01: contracts and vocabulary
--- a/docs/implementation/smart-money/02-evidence-clustering-features.md
+++ b/docs/implementation/smart-money/02-evidence-clustering-features.md
@ -0,0 +1,69 @@
+# Smart-Flow Phase 02: Evidence Clustering and Features
+
+## Purpose
+
+Make evidence extraction, eligibility, quote/context joins, clustering, and feature construction explicit and traceable before hypothesis scoring changes.
+
+## Why this phase comes now
+
+Contracts alone do not change behavior. This phase gives the system a clean evidence layer so later scoring can reason from auditable facts instead of a generic feature bag or overconfident classifier labels.
+
+## Dependencies on earlier phases
+
+- `islandflow-zxh.1` - Smart-flow contracts and vocabulary
+- `islandflow-259.2` - Synthetic manifests, fixtures, and CLI
+
+## Likely files/modules touched
+
+- `services/compute/src/`
+- `packages/types/src/events.ts`
+- `packages/storage/src/` for typed evidence storage planning or implementation
+- Tests under `services/compute/tests/`
+- Fixture helpers from the synthetic package
+
+## In-scope work
+
+- Represent direct observations, quote joins, execution context, and eligibility decisions as evidence facts.
+- Build deterministic evidence clusters with traceable source refs.
+- Compute feature vectors from evidence while preserving whether a value is observed, derived, or inferred.
+- Carry evidence quality, stale quote, wide spread, odd lot, complex spread, and noisy context signals.
+- Move toward ingest-as-normalization, not ingest-as-signal-policy.
+
+## Explicitly out-of-scope work
+
+- Final hypothesis score policy.
+- API and UI explainability.
+- Historical calibration.
+- Claiming participant identity.
+- Replacing all storage tables in the same PR.
+
+## Acceptance criteria
+
+- Eligibility decisions have explicit accept, reject, or down-weight reasons.
+- Evidence clusters have deterministic keys/windows and preserve raw refs.
+- Feature values trace back to evidence refs.
+- Stale, wide, noisy, or ambiguous conditions can be represented without pretending to know intent.
+- The phase is split into PR-sized children when implementation starts.
+
+## Test strategy
+
+Use deterministic fixtures from synthetic phase 02 where available. Add focused tests for quote joining, eligibility rejection, cluster key stability, feature derivation, and trace refs. Keep tests infra-free unless a later optional storage integration explicitly needs services.
+
+## Risks / design traps
+
+- Recreating the old `FlowPacket` as a renamed generic feature bag.
+- Letting ingest services make signal-policy decisions.
+- Losing evidence refs during aggregation.
+- Treating cluster features as hypotheses before the scoring phase.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/smart-money/02-evidence-clustering-features.md for Beads issue islandflow-zxh.2. Use split issues islandflow-zxh.2.1 and islandflow-zxh.2.2 for PR-sized work. Focus on evidence facts, eligibility, clustering, and traceable features. Do not implement final scoring, API/UI explainability, or calibration.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-zxh.2` - Smart-flow phase 02: evidence clustering and features
+- PR split: `islandflow-zxh.2.1` - Split smart-flow phase 02a: eligibility and evidence facts
+- PR split: `islandflow-zxh.2.2` - Split smart-flow phase 02b: clustering and feature vectors
--- a/docs/implementation/smart-money/03-hypothesis-scoring-abstention.md
+++ b/docs/implementation/smart-money/03-hypothesis-scoring-abstention.md
@ -0,0 +1,70 @@
+# Smart-Flow Phase 03: Hypothesis Scoring and Abstention
+
+## Purpose
+
+Convert evidence clusters into cautious flow hypotheses with explicit score vectors, alternatives, penalties, confidence, conviction, and abstention reasons.
+
+## Why this phase comes now
+
+Scoring should wait until the system can represent evidence clearly and synthetic scenarios can describe expected positive, negative, and abstention cases. This phase is where the product stops acting like every signal is a confident "smart money" claim.
+
+## Dependencies on earlier phases
+
+- `islandflow-zxh.1` - Smart-flow contracts and vocabulary
+- `islandflow-zxh.2` - Evidence clustering and features
+- `islandflow-259.3` - Synthetic scenarios, labels, and expected outputs
+
+## Likely files/modules touched
+
+- `services/compute/src/`
+- `packages/types/src/events.ts`
+- `packages/storage/src/smart-money-events.ts` or successor storage modules
+- Compute tests and fixture/golden comparison helpers
+- Compatibility projection code for legacy alerts or classifier hits
+
+## In-scope work
+
+- Define score vectors for hypothesis type, direction, evidence strength, confidence, conviction, and penalties.
+- Preserve alternative explanations and negative evidence.
+- Make abstention a first-class output with reasons.
+- Add policy/model version fields.
+- Derive compatibility `SmartFlowInsight` or legacy projections from canonical hypothesis events.
+
+## Explicitly out-of-scope work
+
+- UI presentation overhaul.
+- API endpoint expansion.
+- Historical calibration.
+- Participant identity claims.
+- Tuning all thresholds against live historical data.
+
+## Acceptance criteria
+
+- Hypothesis scores separate evidence strength, confidence, conviction, and penalties.
+- Abstention outputs include machine-readable and user-readable reasons.
+- Alternative explanations are preserved.
+- Compatibility projections do not become the canonical domain model.
+- Score policy changes are deterministic against synthetic fixtures.
+
+## Test strategy
+
+Use synthetic scenario fixtures and expected-output manifests. Cover positive hypotheses, abstentions, false-positive suppressions, alternative explanations, and noisy scenarios. Keep output comparisons stable and focused on score signatures rather than brittle full payload dumps.
+
+## Risks / design traps
+
+- Rebranding old classifier hits as hypotheses without changing semantics.
+- Treating confidence as probability when it is only policy confidence.
+- Hiding abstention in logs instead of output events.
+- Letting compatibility alert projections dictate canonical scoring design.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/smart-money/03-hypothesis-scoring-abstention.md for Beads issue islandflow-zxh.3. Use split issues islandflow-zxh.3.1 and islandflow-zxh.3.2 for PR-sized work. Build cautious hypothesis scoring, alternatives, and abstention from evidence clusters. Do not add API/UI explainability or historical calibration.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-zxh.3` - Smart-flow phase 03: hypothesis scoring and abstention
+- PR split: `islandflow-zxh.3.1` - Split smart-flow phase 03a: hypothesis score vectors
+- PR split: `islandflow-zxh.3.2` - Split smart-flow phase 03b: abstention and insight projection
--- a/docs/implementation/smart-money/04-replay-evaluation-golden-tests.md
+++ b/docs/implementation/smart-money/04-replay-evaluation-golden-tests.md
@ -0,0 +1,69 @@
+# Smart-Flow Phase 04: Replay Evaluation and Golden Tests
+
+## Purpose
+
+Make deterministic replay and golden output comparison the acceptance gate for smart-flow behavior changes.
+
+## Why this phase comes now
+
+Replay evaluation should come after synthetic replay can select stable runs and after hypothesis scoring has outputs worth validating. This phase turns architecture discipline into a repeatable test path.
+
+## Dependencies on earlier phases
+
+- `islandflow-zxh.1` - Smart-flow contracts and vocabulary
+- `islandflow-zxh.2` - Evidence clustering and features
+- `islandflow-zxh.3` - Hypothesis scoring and abstention
+- `islandflow-259.4` - Synthetic replay integration
+
+## Likely files/modules touched
+
+- `services/replay/src/`
+- `services/compute/tests/`
+- Synthetic fixture and manifest comparison helpers
+- Golden fixture directories
+- Optional service-container integration config if added later
+
+## In-scope work
+
+- Recompute derived evidence/hypothesis outputs from raw synthetic streams.
+- Compare stable output signatures with expected manifests.
+- Include positive, abstention, false-positive, and noisy scenarios.
+- Make replay/golden tests deterministic and infra-free by default.
+- Gate optional ClickHouse/NATS/Redis tests outside the default path.
+
+## Explicitly out-of-scope work
+
+- New scoring policy beyond fixes needed for deterministic evaluation.
+- UI explainability.
+- Historical calibration.
+- Large generated fixture dumps.
+- Making Docker-backed tests mandatory.
+
+## Acceptance criteria
+
+- Replay recomputes derived smart-flow outputs from raw fixtures.
+- Golden signatures cover positive, abstain, false-positive, and noisy scenarios.
+- Default tests are deterministic and infra-free.
+- Optional service-backed tests are clearly gated.
+- Failures show concise, reviewable diffs or signature mismatches.
+
+## Test strategy
+
+Use fixture-backed replay and compact golden signatures first. Add a small number of representative scenarios rather than broad generated dumps. If service-backed tests are added, mark them optional and document their dependencies.
+
+## Risks / design traps
+
+- Golden files that are too large will become rubber-stamped.
+- Full payload comparisons may break on harmless metadata changes.
+- Optional infra tests can accidentally become required in CI.
+- Replay that starts from derived events instead of raw fixtures will miss pipeline regressions.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/smart-money/04-replay-evaluation-golden-tests.md for Beads issue islandflow-zxh.4. Build deterministic replay/golden validation from raw synthetic fixtures. Keep default tests infra-free, compare stable signatures, and do not add UI explainability or historical calibration.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-zxh.4` - Smart-flow phase 04: replay evaluation and golden tests
--- a/docs/implementation/smart-money/05-api-ui-explainability.md
+++ b/docs/implementation/smart-money/05-api-ui-explainability.md
@ -0,0 +1,72 @@
+# Smart-Flow Phase 05: API/UI Explainability
+
+## Purpose
+
+Expose evidence-backed smart-flow outputs through API, websocket, and UI surfaces that make evidence quality, confidence, conviction, alternatives, and abstention understandable.
+
+## Why this phase comes now
+
+The presentation layer should wait until contracts, evidence, scoring, and replay evaluation are stable. Otherwise the UI will harden old overconfident language or teach users to trust unvalidated outputs.
+
+## Dependencies on earlier phases
+
+- `islandflow-zxh.1` - Smart-flow contracts and vocabulary
+- `islandflow-zxh.2` - Evidence clustering and features
+- `islandflow-zxh.3` - Hypothesis scoring and abstention
+- `islandflow-zxh.4` - Replay evaluation and golden tests
+- `islandflow-259.5` - Synthetic demo and load profiles
+
+## Likely files/modules touched
+
+- `services/api/src/`
+- Websocket payload types and channel names
+- `apps/web/`
+- Shared UI/domain types in `packages/types/`
+- API and UI tests
+
+## In-scope work
+
+- Add or alias API/WS surfaces for evidence, hypotheses, insights, alternatives, and abstention.
+- Keep legacy smart-money endpoints as aliases where needed, not canonical contracts.
+- Rework UI surfaces around evidence quality, confidence versus conviction, alternatives, abstention, and why-not context.
+- Ensure named deterministic demos can display stable explainability examples.
+- Keep replay/golden validation tied to changed projections.
+
+## Explicitly out-of-scope work
+
+- Rewriting scoring policy.
+- Adding new synthetic foundations.
+- Historical calibration.
+- Claiming participant identity.
+- UI copy that implies certainty where the model only has evidence-backed hypotheses.
+
+## Acceptance criteria
+
+- API/WS payloads expose evidence refs, hypotheses, insights, alternatives, abstention reasons, and version fields.
+- UI distinguishes evidence quality, confidence, conviction, and why-not signals.
+- Legacy smart-money surfaces remain compatibility aliases where required.
+- Replay/golden checks support changed projection behavior.
+- Explainability copy avoids overconfident certainty claims.
+
+## Test strategy
+
+Use API contract tests, websocket payload tests, and focused UI tests for evidence/abstention rendering. Validate with deterministic demo runs from synthetic phase 05. Manual visual review should supplement, not replace, replay/golden validation.
+
+## Risks / design traps
+
+- UI can accidentally reintroduce "smart money" certainty.
+- API aliases can become de facto canonical if not documented.
+- Too many fields without hierarchy will make explainability harder to scan.
+- Building UI before replay validation can make demos persuasive but untrustworthy.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/smart-money/05-api-ui-explainability.md for Beads issue islandflow-zxh.5. Use split issues islandflow-zxh.5.1 and islandflow-zxh.5.2 for PR-sized work. Expose evidence-backed API/WS/UI explainability after replay/golden validation. Do not change core scoring or add calibration.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-zxh.5` - Smart-flow phase 05: API/UI explainability
+- PR split: `islandflow-zxh.5.1` - Split smart-flow phase 05a: evidence API and websocket surfaces
+- PR split: `islandflow-zxh.5.2` - Split smart-flow phase 05b: UI explainability surfaces
--- a/docs/implementation/smart-money/99-future-calibration.md
+++ b/docs/implementation/smart-money/99-future-calibration.md
@ -0,0 +1,65 @@
+# Smart-Flow Phase 99: Future Calibration
+
+## Purpose
+
+Plan future calibration of smart-flow confidence, policy thresholds, penalties, and abstention behavior after the MVP evidence/hypothesis pipeline is working and replay-validated.
+
+## Why this phase comes now
+
+The architecture should leave room for calibration, but calibration should not block the MVP. The system first needs clean facts, evidence, hypotheses, and replayable evaluation before tuning can be meaningful.
+
+## Dependencies on earlier phases
+
+- `islandflow-zxh.5` - Smart-flow API/UI explainability
+- `islandflow-259.6` - Future synthetic historical calibration
+
+## Likely files/modules touched
+
+- Future calibration tooling in `services/compute/` or a research package
+- Policy/model version registry
+- Evaluation reports or benchmark datasets
+- Storage/query helpers for historical derived outputs
+- Documentation for metrics and calibration governance
+
+## In-scope work
+
+- Define calibration datasets and evaluation metrics.
+- Specify how confidence, conviction, penalties, abstention, and alternatives are tuned.
+- Preserve policy/model versioning and replayability.
+- Document what makes a calibration dataset acceptable.
+- Keep user-facing confidence semantics auditable.
+
+## Explicitly out-of-scope work
+
+- MVP contracts and scoring foundations.
+- API/UI explainability for the initial pipeline.
+- Treating historical calibration as proof of participant identity.
+- Using private or licensed data in committed fixtures without approval.
+
+## Acceptance criteria
+
+- Calibration remains outside the MVP blocker chain.
+- Dataset provenance, metrics, and policy versioning are documented before implementation.
+- Confidence and abstention semantics remain explainable after tuning.
+- Replay can compare calibrated policy versions without losing auditability.
+
+## Test strategy
+
+When implemented, use replayed benchmark datasets with versioned policy outputs. Track false positives, abstentions, precision-like metrics, and scenario-specific regressions. Keep calibration tests separate from the early deterministic fixture tests.
+
+## Risks / design traps
+
+- Treating calibrated confidence as objective truth.
+- Tuning to demos instead of representative market regimes.
+- Losing policy version lineage.
+- Committing restricted data or large generated benchmark artifacts.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/smart-money/99-future-calibration.md for Beads issue islandflow-zxh.6 only after the MVP smart-flow phases are complete. Define calibration datasets, metrics, policy versioning, and replay comparison. Do not make calibration a prerequisite for earlier evidence, scoring, or UI work.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-zxh.6` - Future smart-flow phase 99: calibration
--- a/docs/implementation/synthetic-market-data/00-roadmap.md
+++ b/docs/implementation/synthetic-market-data/00-roadmap.md
@ -0,0 +1,36 @@
+# Synthetic Market-Data Roadmap
+
+This roadmap breaks `docs/plans/synthetic-market-data-architecture-review.md` into implementation-sized phases. The recommended direction is still Option B: extract deterministic synthetic generation into a first-class reusable engine while keeping the useful NATS, ClickHouse, compute, API, replay, and web stack.
+
+## Core Constraints
+
+- Emit canonical market event types: `OptionPrint`, `OptionNBBO`, `EquityPrint`, and `EquityQuote`.
+- Do not create synthetic-only market event types for the main pipeline.
+- Keep hidden ground-truth labels separate from emitted market events.
+- Keep early quality gates infra-free: `bun test` should not require Docker, ClickHouse, NATS, or Redis.
+- Build deterministic foundations before demos, UI controls, or live synthetic service behavior.
+- Treat historical calibration as future work, not as a dependency for the MVP synthetic generator.
+
+## Phase Sequence
+
+| Phase | Beads issue | Depends on | Purpose |
+| --- | --- | --- | --- |
+| 01 - Deterministic spine | `islandflow-259.1` | None | Create the seeded generation foundation and canonical event output contract. |
+| 02 - Manifests, fixtures, CLI | `islandflow-259.2` | `islandflow-zxh.1` | Turn deterministic generation into durable fixtures and manifests. |
+| 03 - Scenarios, labels, expected outputs | `islandflow-259.3` | `islandflow-zxh.2` | Author named scenarios, separate labels, and expected derived outputs. |
+| 04 - Replay integration | `islandflow-259.4` | `islandflow-zxh.3` | Make replay consume synthetic runs with stable ordering and output comparison. |
+| 05 - Demo and load profiles | `islandflow-259.5` | `islandflow-zxh.4` | Expose named deterministic demo/load profiles after replay validation. |
+| 99 - Future historical calibration | `islandflow-259.6` | `islandflow-259.5` | Calibrate parameters from historical data later, after the MVP is stable. |
+
+## PR Split Notes
+
+Most phases are intended to fit in one focused PR. Phase 03 is already split into PR-sized Beads children because scenario authoring and expected-output comparison can grow quickly:
+
+- `islandflow-259.3.1` - Split synthetic phase 03a: scenario catalog and labels
+- `islandflow-259.3.2` - Split synthetic phase 03b: expected-output manifests
+
+If any other phase starts touching unrelated service, API, UI, and storage behavior in one PR, split it before implementation continues.
+
+## Matching Beads Epic
+
+- `islandflow-259` - Plan synthetic market-data implementation phases
--- a/docs/implementation/synthetic-market-data/01-deterministic-spine.md
+++ b/docs/implementation/synthetic-market-data/01-deterministic-spine.md
@ -0,0 +1,68 @@
+# Synthetic Market-Data Phase 01: Deterministic Spine
+
+## Purpose
+
+Create the reusable deterministic foundation for synthetic market data. This phase should define the package/API shape for seeded generation, stable run identity, profile inputs, canonical event outputs, and provenance metadata.
+
+## Why this phase comes now
+
+Everything else depends on reproducible raw events. Manifests, labels, replay, demos, and smart-flow tests are only trustworthy if the same seed/profile bundle produces the same canonical market event stream every time.
+
+## Dependencies on earlier phases
+
+None. This is the first synthetic phase.
+
+## Likely files/modules touched
+
+- Future `packages/synthetic-market/` workspace or equivalent package boundary
+- `packages/types/src/events.ts`
+- Synthetic logic currently embedded in `services/ingest-options/` and `services/ingest-equities/`
+- Shared package manifests such as `package.json`, `bunfig.toml`, or workspace config if a new package is added
+- Infra-free unit tests under the new package or nearby package test folders
+
+## In-scope work
+
+- Define `SyntheticRun`, `SeedBundle`, `ParameterSnapshot`, `SymbolProfile`, `LiquidityProfile`, `VolatilityRegime`, `OptionChainProfile`, and `GeneratedEventBatch` shapes.
+- Pick and wrap a deterministic PRNG so fixed inputs produce stable output.
+- Emit canonical `OptionPrint`, `OptionNBBO`, `EquityPrint`, and `EquityQuote` events.
+- Attach provenance such as `source_kind`, `run_id`, `parameter_snapshot_hash`, and optional `scenario_id`.
+- Preserve compatibility with the existing pipeline's raw market event contracts.
+- Add fast deterministic tests that run in plain `bun test`.
+
+## Explicitly out-of-scope work
+
+- Scenario catalogs and ground-truth label records.
+- Manifest generation and CLI workflows.
+- Replay service integration.
+- Hosted demo controls or live synthetic emitters.
+- Historical calibration from real market data.
+- Docker, ClickHouse, NATS, or Redis integration tests.
+
+## Acceptance criteria
+
+- A fixed seed/profile bundle produces byte-stable or hash-stable event output.
+- Generated events use canonical market event contracts, not synthetic-only pipeline event types.
+- Hidden labels are not embedded in emitted market events.
+- Provenance metadata is available for downstream filtering and auditing.
+- Tests cover determinism, tick validity, quote/trade invariants, and basic profile normalization without requiring infrastructure.
+
+## Test strategy
+
+Use infra-free Bun tests. Cover PRNG repeatability, profile parsing, event ordering within generated batches, option quote/print validity, equity quote/print validity, and provenance field stability. Avoid any test that needs Docker, ClickHouse, NATS, or Redis.
+
+## Risks / design traps
+
+- Hiding wall-clock timers or random calls inside the generator will break determinism.
+- Creating synthetic-only market event types will fork the pipeline contract.
+- Embedding labels directly on market events will leak ground truth into production-like paths.
+- Over-designing a full market simulator now will slow down the MVP.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/synthetic-market-data/01-deterministic-spine.md for Beads issue islandflow-259.1. Stay inside the deterministic synthetic market-data foundation only. Do not add scenario labels, manifests, replay integration, demos, or historical calibration. Emit canonical market event types and keep early tests infra-free.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-259.1` - Synthetic market-data phase 01: deterministic spine
--- a/docs/implementation/synthetic-market-data/02-manifests-fixtures-cli.md
+++ b/docs/implementation/synthetic-market-data/02-manifests-fixtures-cli.md
@ -0,0 +1,68 @@
+# Synthetic Market-Data Phase 02: Manifests, Fixtures, and CLI
+
+## Purpose
+
+Turn the deterministic generator into reusable artifacts: fixture files, run manifests, and a CLI that can produce repeatable synthetic runs for tests, replay, demos, and later evaluation.
+
+## Why this phase comes now
+
+The deterministic spine gives the repo stable raw events. The next step is to make those events durable and addressable so downstream phases can reference exact generated runs instead of recreating ad hoc local randomness.
+
+## Dependencies on earlier phases
+
+- `islandflow-259.1` - Synthetic deterministic spine
+- `islandflow-zxh.1` - Smart-flow contracts and vocabulary, so manifest expectations can align with the emerging evidence/hypothesis language
+
+## Likely files/modules touched
+
+- Future `packages/synthetic-market/` CLI entrypoints
+- Fixture directories under a package or service test area
+- Manifest schemas, likely JSON or YAML
+- `package.json` scripts if a repo command is added
+- Tests for manifest parsing and fixture generation
+
+## In-scope work
+
+- Define `ExpectedOutputManifest`, `ReplayPlan`, and generated fixture artifact layout.
+- Add a CLI command that accepts seed bundle, profile, scenario/run name, output directory, and deterministic generation options.
+- Write manifests that pin generator version, seed bundle, parameter snapshot hash, generated event hashes, replay ordering, and run metadata.
+- Add fixture helpers for tests to load generated batches without infrastructure.
+- Keep labels as separate records or future manifest sections, not market-event fields.
+
+## Explicitly out-of-scope work
+
+- Full scenario catalog authoring.
+- Smart-flow expected output comparisons.
+- Replay service source selection.
+- ClickHouse fixture materialization.
+- UI demo selection.
+- Historical calibration.
+
+## Acceptance criteria
+
+- A CLI can generate repeatable fixtures and manifests from fixed inputs.
+- Manifests include generator version, seed/profile identity, parameter hash, event hashes, and replay ordering.
+- Fixture helpers can load generated event batches in infra-free tests.
+- Generated artifacts do not embed hidden labels into canonical market events.
+- Re-running generation with the same inputs produces stable manifests or an intentional diff.
+
+## Test strategy
+
+Use plain Bun tests for CLI argument parsing, manifest schema parsing, deterministic fixture output, and fixture-loader helpers. Golden files should be small and intentionally reviewed. Do not require Docker, ClickHouse, NATS, or Redis.
+
+## Risks / design traps
+
+- Manifests that omit generator version or parameter hashes will become hard to audit.
+- Large generated fixtures can create noisy reviews; keep early fixtures tiny.
+- A CLI that silently uses defaults will make tests look deterministic while hiding input drift.
+- Mixing expected smart-flow outputs too early can couple this phase to unfinished classifier changes.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/synthetic-market-data/02-manifests-fixtures-cli.md for Beads issue islandflow-259.2. Build manifest, fixture, and CLI support on top of the deterministic spine. Keep tests infra-free and do not implement scenario labels, replay integration, demo profiles, or historical calibration.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-259.2` - Synthetic market-data phase 02: manifests, fixtures, and CLI
--- a/docs/implementation/synthetic-market-data/03-scenarios-labels-expected-outputs.md
+++ b/docs/implementation/synthetic-market-data/03-scenarios-labels-expected-outputs.md
@ -0,0 +1,71 @@
+# Synthetic Market-Data Phase 03: Scenarios, Labels, and Expected Outputs
+
+## Purpose
+
+Author named deterministic scenarios, separate ground-truth labels, and expected-output manifests that downstream smart-flow logic can use for positive, negative, abstention, and false-positive validation.
+
+## Why this phase comes now
+
+The generator and manifest layers should exist before scenario authoring. Smart-flow evidence clustering should also define enough vocabulary for expected outputs to describe evidence requirements without leaking labels into emitted market events.
+
+## Dependencies on earlier phases
+
+- `islandflow-259.1` - Synthetic deterministic spine
+- `islandflow-zxh.1` - Smart-flow contracts and vocabulary
+- `islandflow-259.2` - Manifests, fixtures, and CLI
+- `islandflow-zxh.2` - Evidence clustering and features
+
+## Likely files/modules touched
+
+- Future scenario catalog files under `packages/synthetic-market/`
+- Label schema definitions
+- Manifest expected-output sections
+- Fixture generation tests
+- Smart-flow fixture expectations in compute test areas, once available
+
+## In-scope work
+
+- Define `ScenarioInjection` and `GroundTruthLabel` records.
+- Add named scenario profiles for institutional directional flow, retail-attention flow, event/noise flow, volatility-seller behavior, hedge-reactive flow, arbitrage-like structure, and no-alert negatives.
+- Keep labels keyed by `run_id`, `scenario_id`, event IDs or trace IDs, expected class, expected direction, confidence band, required evidence, forbidden evidence, and false-positive penalties.
+- Extend manifests with expected derived events, alert/no-alert expectations, and evidence requirements.
+- Make generated scenario outputs reviewable and deterministic.
+
+## Explicitly out-of-scope work
+
+- Emitting labels on market events.
+- Building a live synthetic service.
+- Adding UI scenario controls.
+- Implementing historical calibration.
+- Rewriting smart-flow scoring behavior beyond what is needed to express expected outputs.
+
+## Acceptance criteria
+
+- Scenario fixtures are named, deterministic, and small enough for review.
+- Labels remain separate from emitted market events.
+- Expected-output manifests include positive expectations, no-alert expectations, evidence requirements, forbidden evidence, and false-positive penalties.
+- The phase can test both "should detect" and "should abstain or suppress" cases.
+- Existing issue `islandflow-9dz` is treated as related scenario-tuning context, not as the broad phase tracker.
+
+## Test strategy
+
+Use fixture-generation and manifest-validation tests first. Add focused golden comparisons only where the smart-flow contract is ready. Keep the default test path infra-free. Optional service-backed scenario loading can wait for a later integration phase.
+
+## Risks / design traps
+
+- Labels leaking into canonical event payloads will invalidate evaluation.
+- Only authoring positive scenarios will make the classifier overfit demos.
+- Broad scenario catalogs can become too large for one PR.
+- Expected outputs that name legacy "smart money" certainty can undermine the new evidence/hypothesis model.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/synthetic-market-data/03-scenarios-labels-expected-outputs.md for Beads issue islandflow-259.3. Split the work using islandflow-259.3.1 and islandflow-259.3.2 if needed. Keep labels separate from emitted events, include negative/no-alert expectations, and avoid demos or live service work.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-259.3` - Synthetic market-data phase 03: scenarios, labels, and expected outputs
+- PR split: `islandflow-259.3.1` - Split synthetic phase 03a: scenario catalog and labels
+- PR split: `islandflow-259.3.2` - Split synthetic phase 03b: expected-output manifests
--- a/docs/implementation/synthetic-market-data/04-replay-integration.md
+++ b/docs/implementation/synthetic-market-data/04-replay-integration.md
@ -0,0 +1,69 @@
+# Synthetic Market-Data Phase 04: Replay Integration
+
+## Purpose
+
+Make replay consume synthetic runs deterministically, either directly from generated fixtures or from materialized storage rows, while preserving the same ordering semantics the real replay path uses.
+
+## Why this phase comes now
+
+Replay should not be wired to synthetic data until the generator, manifests, labels, and smart-flow hypothesis pipeline have stable semantics. At this point, replay can become a serious acceptance gate instead of a demo convenience.
+
+## Dependencies on earlier phases
+
+- `islandflow-259.1` - Synthetic deterministic spine
+- `islandflow-259.2` - Manifests, fixtures, and CLI
+- `islandflow-259.3` - Scenarios, labels, and expected outputs
+- `islandflow-zxh.3` - Hypothesis scoring and abstention
+
+## Likely files/modules touched
+
+- `services/replay/src/`
+- API replay routes in `services/api/`
+- Replay-related shared types in `packages/types/`
+- Optional fixture materialization helpers in `packages/storage/`
+- Replay tests or golden comparison helpers
+
+## In-scope work
+
+- Add replay source/run selectors for synthetic runs.
+- Support fixture-backed replay without infrastructure where practical.
+- Preserve ordering by event time, ingest time, sequence, and stable event ID.
+- Compare replayed derived outputs against manifest signatures or expected-output sections.
+- Keep optional ClickHouse/NATS materialized replay tests behind non-default gates.
+
+## Explicitly out-of-scope work
+
+- Building new scenario labels.
+- Reworking smart-flow scoring policy.
+- Demo profile controls.
+- Load testing.
+- Historical calibration.
+
+## Acceptance criteria
+
+- Replay can select a synthetic source and `run_id`.
+- Fixture-backed replay respects manifest ordering.
+- Derived output signatures can be compared with expected manifests.
+- Fast replay tests remain infra-free by default.
+- Optional infra-backed tests are clearly named and gated.
+
+## Test strategy
+
+Start with fixture-backed replay ordering tests and manifest-signature comparisons. Add optional service-container or ClickHouse materialization tests only after the fast path is stable, and do not make those tests part of the default `bun test` requirement.
+
+## Risks / design traps
+
+- Creating a synthetic-only replay path with different ordering will hide bugs.
+- Letting optional infra tests become default will slow or destabilize CI.
+- Comparing full raw payloads everywhere may make tests brittle; use stable signatures where better.
+- Replay selectors that are not run-scoped can mix synthetic and live data.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/synthetic-market-data/04-replay-integration.md for Beads issue islandflow-259.4. Add synthetic source/run replay support with stable ordering and manifest comparison. Do not add demo controls, load profiles, or historical calibration, and keep the fast test path infra-free.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-259.4` - Synthetic market-data phase 04: replay integration
--- a/docs/implementation/synthetic-market-data/05-demo-load-profiles.md
+++ b/docs/implementation/synthetic-market-data/05-demo-load-profiles.md
@ -0,0 +1,70 @@
+# Synthetic Market-Data Phase 05: Demo and Load Profiles
+
+## Purpose
+
+Expose deterministic synthetic runs as named demo and load profiles after the generation, manifest, scenario, and replay foundations are in place.
+
+## Why this phase comes now
+
+Demos are useful only after the underlying data can be trusted. This phase deliberately waits until replay and golden evaluation prove the event semantics, so hosted controls do not become a front door to ambient randomness.
+
+## Dependencies on earlier phases
+
+- `islandflow-259.1` - Synthetic deterministic spine
+- `islandflow-259.2` - Manifests, fixtures, and CLI
+- `islandflow-259.3` - Scenarios, labels, and expected outputs
+- `islandflow-259.4` - Replay integration
+- `islandflow-zxh.4` - Smart-flow replay evaluation and golden tests
+
+## Likely files/modules touched
+
+- Thin synthetic emitters in `services/ingest-options/` and `services/ingest-equities/`
+- Demo/run selection API surfaces in `services/api/`
+- Web demo controls in `apps/web/`
+- Load profile definitions in the synthetic package
+- Tests for profile selection and rate scaling
+
+## In-scope work
+
+- Add named `DemoProfile` and `LoadProfile` definitions.
+- Make live/demo emitters thin consumers of deterministic synthetic runs.
+- Let demo controls select named runs/scenarios rather than changing hidden random behavior.
+- Ensure load profiles scale event rates without changing event semantics.
+- Document local demo usage once implemented.
+
+## Explicitly out-of-scope work
+
+- Foundation generator work.
+- New smart-flow scoring policy.
+- Replacing replay evaluation with UI-only checks.
+- Historical calibration.
+- Production provider configuration decisions.
+
+## Acceptance criteria
+
+- Demo profiles are deterministic and named.
+- Load profiles scale rate or volume without mutating scenario semantics.
+- Hosted or local controls select known runs/scenarios.
+- Live/demo emitters remain thin and do not own generator policy.
+- The UI does not expose synthetic controls before the backing deterministic runs exist.
+
+## Test strategy
+
+Use unit tests for profile parsing, profile selection, and rate-scaling semantics. Add replay-driven smoke checks for named demo runs. Manual UI validation is appropriate only after automated replay/golden checks pass.
+
+## Risks / design traps
+
+- Demo controls can pressure the codebase back into wall-clock randomness.
+- Load profiles may accidentally change business semantics while changing only rate was intended.
+- UI-first implementation can hide missing run provenance.
+- Reusing production config for synthetic demos can make operator behavior ambiguous.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/synthetic-market-data/05-demo-load-profiles.md for Beads issue islandflow-259.5. Add named deterministic demo/load profiles and thin emitter/control integration only after replay validation exists. Do not implement historical calibration or change production provider policy.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-259.5` - Synthetic market-data phase 05: demo and load profiles
--- a/docs/implementation/synthetic-market-data/99-future-historical-calibration.md
+++ b/docs/implementation/synthetic-market-data/99-future-historical-calibration.md
@ -0,0 +1,64 @@
+# Synthetic Market-Data Phase 99: Future Historical Calibration
+
+## Purpose
+
+Plan future calibration of synthetic generator parameters from historical market data without making historical data a dependency for the MVP generator.
+
+## Why this phase comes now
+
+It is useful to name the future work now so early designs keep calibration hooks in mind. It should not come before deterministic generation, manifests, scenarios, replay, or demo profiles.
+
+## Dependencies on earlier phases
+
+- `islandflow-259.5` - Synthetic demo and load profiles
+
+## Likely files/modules touched
+
+- Future calibration tools under the synthetic package
+- Historical data import or sampling utilities
+- Parameter fitting scripts
+- Documentation for data provenance and licensing constraints
+- Optional research notebooks or reports if the repo later adopts them
+
+## In-scope work
+
+- Define calibration datasets and constraints.
+- Specify how historical distributions map to `ParameterSnapshot`, liquidity, volatility, and option-chain profiles.
+- Preserve deterministic replay from calibrated parameters.
+- Document privacy, licensing, and provenance requirements for historical data.
+
+## Explicitly out-of-scope work
+
+- MVP synthetic generator requirements.
+- Early tests and fixture generation.
+- Live synthetic demos.
+- Smart-flow scoring changes.
+- Any assumption that historical data is needed to start implementation.
+
+## Acceptance criteria
+
+- Historical calibration remains outside the MVP blocker chain.
+- Calibration inputs and ownership constraints are documented before implementation.
+- Fitted parameters can still be pinned into deterministic seed/profile bundles.
+- Calibration does not require emitted synthetic events to diverge from canonical market event contracts.
+
+## Test strategy
+
+When this future phase is implemented, use small public or licensed calibration samples with deterministic parameter fitting tests. Add regression checks that calibrated profiles still produce stable manifests. Do not retrofit historical data into earlier infra-free tests.
+
+## Risks / design traps
+
+- Treating calibration as necessary for MVP will delay foundational work.
+- Historical data licensing can constrain what can be committed or shared.
+- Overfitting synthetic profiles to a tiny period can produce misleading demos.
+- Calibration tools can accidentally leak proprietary or sensitive data into fixtures.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/synthetic-market-data/99-future-historical-calibration.md for Beads issue islandflow-259.6 only after MVP synthetic phases are complete. Keep calibration optional, documented, and deterministic. Do not make historical data a dependency for earlier synthetic tests or demos.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-259.6` - Future synthetic market-data phase 99: historical calibration