plan synthetic and smart-flow phases

2026-06-16 13:46:08 -04:00 · 2026-06-16 13:46:08 -04:00 · eaa22de302
commit eaa22de302
parent d1fac6c7ec
19 changed files with 1198 additions and 1 deletions
--- a/docs/implementation/smart-money/00-roadmap.md
+++ b/docs/implementation/smart-money/00-roadmap.md
@ -0,0 +1,40 @@
+# Smart Money / Smart Flow Roadmap
+
+This roadmap breaks `docs/plans/smart-flow-architecture-review.md` into implementation-sized phases. The recommended direction is Option B: keep the working stack, but rebuild the domain pipeline around observations, evidence clusters, cautious hypotheses, confidence, alternatives, abstention, replay evaluation, and user-facing insight projections.
+
+## Core Constraints
+
+- Do not treat "smart money" as a canonical fact emitted by the system.
+- Distinguish direct facts, evidence, hypotheses, confidence, alternatives, and abstention.
+- Preserve evidence and uncertainty in storage, API, websocket, and UI surfaces.
+- Keep Redis as hot cache only, not hidden baseline truth.
+- Make replay evaluation the acceptance gate before expanding UI confidence.
+- Keep historical or research-grade calibration as future work, not an MVP dependency.
+
+## Phase Sequence
+
+| Phase | Beads issue | Depends on | Purpose |
+| --- | --- | --- | --- |
+| 01 - Contracts and vocabulary | `islandflow-zxh.1` | `islandflow-259.1` | Define evidence/hypothesis/insight contracts and retire canonical overconfidence. |
+| 02 - Evidence clustering and features | `islandflow-zxh.2` | `islandflow-259.2` | Extract eligibility, evidence facts, clusters, and traceable features. |
+| 03 - Hypothesis scoring and abstention | `islandflow-zxh.3` | `islandflow-259.3` | Score cautious hypotheses and represent abstention/alternatives. |
+| 04 - Replay evaluation and golden tests | `islandflow-zxh.4` | `islandflow-259.4` | Validate derived outputs through deterministic replay and golden fixtures. |
+| 05 - API/UI explainability | `islandflow-zxh.5` | `islandflow-259.5` | Expose evidence-backed insights and uncertainty to API, WS, and UI. |
+| 99 - Future calibration | `islandflow-zxh.6` | `islandflow-zxh.5`, `islandflow-259.6` | Calibrate confidence and policy behavior later with richer datasets. |
+
+## PR Split Notes
+
+Several phases are broad enough to split before implementation:
+
+- `islandflow-zxh.2.1` - Split smart-flow phase 02a: eligibility and evidence facts
+- `islandflow-zxh.2.2` - Split smart-flow phase 02b: clustering and feature vectors
+- `islandflow-zxh.3.1` - Split smart-flow phase 03a: hypothesis score vectors
+- `islandflow-zxh.3.2` - Split smart-flow phase 03b: abstention and insight projection
+- `islandflow-zxh.5.1` - Split smart-flow phase 05a: evidence API and websocket surfaces
+- `islandflow-zxh.5.2` - Split smart-flow phase 05b: UI explainability surfaces
+
+If an implementation PR crosses contracts, compute, storage, API, and UI in one change, stop and split it.
+
+## Matching Beads Epic
+
+- `islandflow-zxh` - Plan smart-money to smart-flow implementation phases
--- a/docs/implementation/smart-money/01-contracts-vocabulary.md
+++ b/docs/implementation/smart-money/01-contracts-vocabulary.md
@ -0,0 +1,66 @@
+# Smart-Flow Phase 01: Contracts and Vocabulary
+
+## Purpose
+
+Introduce the domain vocabulary and contracts that distinguish observations, evidence clusters, hypotheses, confidence, abstention, and user-facing insight projections.
+
+## Why this phase comes now
+
+The current system has useful infrastructure but overconfident domain names. Before changing classifier behavior, the codebase needs the language to express what is observed, what is inferred, what is uncertain, and when the system should abstain.
+
+## Dependencies on earlier phases
+
+- `islandflow-259.1` - Synthetic deterministic spine, so contract work can align with canonical raw event and provenance assumptions.
+
+## Likely files/modules touched
+
+- `packages/types/src/events.ts`
+- Shared type exports in `packages/types/`
+- Compatibility type aliases where legacy names are still needed
+- Storage schema planning docs or migration notes
+- Tests for schema parsing or event compatibility
+
+## In-scope work
+
+- Define or prepare contracts for `FlowEvidenceCluster`, `FlowCandidate`, `FlowHypothesisEvent`, `SmartFlowInsight`, `EvidenceQuality`, `BaselineSnapshot`, and version fields.
+- Mark legacy "smart money" naming as compatibility or projection language, not canonical truth.
+- Define how facts, evidence, hypotheses, scores, confidence, and abstention differ.
+- Preserve compatibility aliases for existing API/UI paths where necessary.
+- Add concise migration notes for future phases.
+
+## Explicitly out-of-scope work
+
+- Rewriting classifier scoring.
+- Moving ingest policy.
+- Adding new API endpoints or UI drawers.
+- Building replay golden suites.
+- Historical calibration or research-grade model fitting.
+
+## Acceptance criteria
+
+- Contracts distinguish observations, evidence, hypotheses, insight projections, confidence, alternatives, and abstention.
+- Legacy naming remains only where compatibility requires it.
+- Version fields are included for policy/model evolution.
+- Future phases can refer to these contracts without redefining the vocabulary.
+- Migration risk and compatibility aliases are documented.
+
+## Test strategy
+
+Use type-level checks and schema/serialization tests where practical. Add compatibility tests only for public contracts that must remain stable. Avoid broad behavior tests until evidence extraction and scoring phases exist.
+
+## Risks / design traps
+
+- Renaming everything without compatibility will break consumers.
+- Keeping "smart money" as canonical language will preserve the old overconfidence.
+- Mixing facts and hypotheses in one event shape will make replay evaluation weaker.
+- Adding too many future fields can make contracts noisy before behavior exists.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/smart-money/01-contracts-vocabulary.md for Beads issue islandflow-zxh.1. Focus on contracts, vocabulary, version fields, and compatibility aliases only. Do not rewrite scoring, API/UI explainability, replay tests, or calibration.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-zxh.1` - Smart-flow phase 01: contracts and vocabulary
--- a/docs/implementation/smart-money/02-evidence-clustering-features.md
+++ b/docs/implementation/smart-money/02-evidence-clustering-features.md
@ -0,0 +1,69 @@
+# Smart-Flow Phase 02: Evidence Clustering and Features
+
+## Purpose
+
+Make evidence extraction, eligibility, quote/context joins, clustering, and feature construction explicit and traceable before hypothesis scoring changes.
+
+## Why this phase comes now
+
+Contracts alone do not change behavior. This phase gives the system a clean evidence layer so later scoring can reason from auditable facts instead of a generic feature bag or overconfident classifier labels.
+
+## Dependencies on earlier phases
+
+- `islandflow-zxh.1` - Smart-flow contracts and vocabulary
+- `islandflow-259.2` - Synthetic manifests, fixtures, and CLI
+
+## Likely files/modules touched
+
+- `services/compute/src/`
+- `packages/types/src/events.ts`
+- `packages/storage/src/` for typed evidence storage planning or implementation
+- Tests under `services/compute/tests/`
+- Fixture helpers from the synthetic package
+
+## In-scope work
+
+- Represent direct observations, quote joins, execution context, and eligibility decisions as evidence facts.
+- Build deterministic evidence clusters with traceable source refs.
+- Compute feature vectors from evidence while preserving whether a value is observed, derived, or inferred.
+- Carry evidence quality, stale quote, wide spread, odd lot, complex spread, and noisy context signals.
+- Move toward ingest-as-normalization, not ingest-as-signal-policy.
+
+## Explicitly out-of-scope work
+
+- Final hypothesis score policy.
+- API and UI explainability.
+- Historical calibration.
+- Claiming participant identity.
+- Replacing all storage tables in the same PR.
+
+## Acceptance criteria
+
+- Eligibility decisions have explicit accept, reject, or down-weight reasons.
+- Evidence clusters have deterministic keys/windows and preserve raw refs.
+- Feature values trace back to evidence refs.
+- Stale, wide, noisy, or ambiguous conditions can be represented without pretending to know intent.
+- The phase is split into PR-sized children when implementation starts.
+
+## Test strategy
+
+Use deterministic fixtures from synthetic phase 02 where available. Add focused tests for quote joining, eligibility rejection, cluster key stability, feature derivation, and trace refs. Keep tests infra-free unless a later optional storage integration explicitly needs services.
+
+## Risks / design traps
+
+- Recreating the old `FlowPacket` as a renamed generic feature bag.
+- Letting ingest services make signal-policy decisions.
+- Losing evidence refs during aggregation.
+- Treating cluster features as hypotheses before the scoring phase.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/smart-money/02-evidence-clustering-features.md for Beads issue islandflow-zxh.2. Use split issues islandflow-zxh.2.1 and islandflow-zxh.2.2 for PR-sized work. Focus on evidence facts, eligibility, clustering, and traceable features. Do not implement final scoring, API/UI explainability, or calibration.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-zxh.2` - Smart-flow phase 02: evidence clustering and features
+- PR split: `islandflow-zxh.2.1` - Split smart-flow phase 02a: eligibility and evidence facts
+- PR split: `islandflow-zxh.2.2` - Split smart-flow phase 02b: clustering and feature vectors
--- a/docs/implementation/smart-money/03-hypothesis-scoring-abstention.md
+++ b/docs/implementation/smart-money/03-hypothesis-scoring-abstention.md
@ -0,0 +1,70 @@
+# Smart-Flow Phase 03: Hypothesis Scoring and Abstention
+
+## Purpose
+
+Convert evidence clusters into cautious flow hypotheses with explicit score vectors, alternatives, penalties, confidence, conviction, and abstention reasons.
+
+## Why this phase comes now
+
+Scoring should wait until the system can represent evidence clearly and synthetic scenarios can describe expected positive, negative, and abstention cases. This phase is where the product stops acting like every signal is a confident "smart money" claim.
+
+## Dependencies on earlier phases
+
+- `islandflow-zxh.1` - Smart-flow contracts and vocabulary
+- `islandflow-zxh.2` - Evidence clustering and features
+- `islandflow-259.3` - Synthetic scenarios, labels, and expected outputs
+
+## Likely files/modules touched
+
+- `services/compute/src/`
+- `packages/types/src/events.ts`
+- `packages/storage/src/smart-money-events.ts` or successor storage modules
+- Compute tests and fixture/golden comparison helpers
+- Compatibility projection code for legacy alerts or classifier hits
+
+## In-scope work
+
+- Define score vectors for hypothesis type, direction, evidence strength, confidence, conviction, and penalties.
+- Preserve alternative explanations and negative evidence.
+- Make abstention a first-class output with reasons.
+- Add policy/model version fields.
+- Derive compatibility `SmartFlowInsight` or legacy projections from canonical hypothesis events.
+
+## Explicitly out-of-scope work
+
+- UI presentation overhaul.
+- API endpoint expansion.
+- Historical calibration.
+- Participant identity claims.
+- Tuning all thresholds against live historical data.
+
+## Acceptance criteria
+
+- Hypothesis scores separate evidence strength, confidence, conviction, and penalties.
+- Abstention outputs include machine-readable and user-readable reasons.
+- Alternative explanations are preserved.
+- Compatibility projections do not become the canonical domain model.
+- Score policy changes are deterministic against synthetic fixtures.
+
+## Test strategy
+
+Use synthetic scenario fixtures and expected-output manifests. Cover positive hypotheses, abstentions, false-positive suppressions, alternative explanations, and noisy scenarios. Keep output comparisons stable and focused on score signatures rather than brittle full payload dumps.
+
+## Risks / design traps
+
+- Rebranding old classifier hits as hypotheses without changing semantics.
+- Treating confidence as probability when it is only policy confidence.
+- Hiding abstention in logs instead of output events.
+- Letting compatibility alert projections dictate canonical scoring design.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/smart-money/03-hypothesis-scoring-abstention.md for Beads issue islandflow-zxh.3. Use split issues islandflow-zxh.3.1 and islandflow-zxh.3.2 for PR-sized work. Build cautious hypothesis scoring, alternatives, and abstention from evidence clusters. Do not add API/UI explainability or historical calibration.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-zxh.3` - Smart-flow phase 03: hypothesis scoring and abstention
+- PR split: `islandflow-zxh.3.1` - Split smart-flow phase 03a: hypothesis score vectors
+- PR split: `islandflow-zxh.3.2` - Split smart-flow phase 03b: abstention and insight projection
--- a/docs/implementation/smart-money/04-replay-evaluation-golden-tests.md
+++ b/docs/implementation/smart-money/04-replay-evaluation-golden-tests.md
@ -0,0 +1,69 @@
+# Smart-Flow Phase 04: Replay Evaluation and Golden Tests
+
+## Purpose
+
+Make deterministic replay and golden output comparison the acceptance gate for smart-flow behavior changes.
+
+## Why this phase comes now
+
+Replay evaluation should come after synthetic replay can select stable runs and after hypothesis scoring has outputs worth validating. This phase turns architecture discipline into a repeatable test path.
+
+## Dependencies on earlier phases
+
+- `islandflow-zxh.1` - Smart-flow contracts and vocabulary
+- `islandflow-zxh.2` - Evidence clustering and features
+- `islandflow-zxh.3` - Hypothesis scoring and abstention
+- `islandflow-259.4` - Synthetic replay integration
+
+## Likely files/modules touched
+
+- `services/replay/src/`
+- `services/compute/tests/`
+- Synthetic fixture and manifest comparison helpers
+- Golden fixture directories
+- Optional service-container integration config if added later
+
+## In-scope work
+
+- Recompute derived evidence/hypothesis outputs from raw synthetic streams.
+- Compare stable output signatures with expected manifests.
+- Include positive, abstention, false-positive, and noisy scenarios.
+- Make replay/golden tests deterministic and infra-free by default.
+- Gate optional ClickHouse/NATS/Redis tests outside the default path.
+
+## Explicitly out-of-scope work
+
+- New scoring policy beyond fixes needed for deterministic evaluation.
+- UI explainability.
+- Historical calibration.
+- Large generated fixture dumps.
+- Making Docker-backed tests mandatory.
+
+## Acceptance criteria
+
+- Replay recomputes derived smart-flow outputs from raw fixtures.
+- Golden signatures cover positive, abstain, false-positive, and noisy scenarios.
+- Default tests are deterministic and infra-free.
+- Optional service-backed tests are clearly gated.
+- Failures show concise, reviewable diffs or signature mismatches.
+
+## Test strategy
+
+Use fixture-backed replay and compact golden signatures first. Add a small number of representative scenarios rather than broad generated dumps. If service-backed tests are added, mark them optional and document their dependencies.
+
+## Risks / design traps
+
+- Golden files that are too large will become rubber-stamped.
+- Full payload comparisons may break on harmless metadata changes.
+- Optional infra tests can accidentally become required in CI.
+- Replay that starts from derived events instead of raw fixtures will miss pipeline regressions.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/smart-money/04-replay-evaluation-golden-tests.md for Beads issue islandflow-zxh.4. Build deterministic replay/golden validation from raw synthetic fixtures. Keep default tests infra-free, compare stable signatures, and do not add UI explainability or historical calibration.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-zxh.4` - Smart-flow phase 04: replay evaluation and golden tests
--- a/docs/implementation/smart-money/05-api-ui-explainability.md
+++ b/docs/implementation/smart-money/05-api-ui-explainability.md
@ -0,0 +1,72 @@
+# Smart-Flow Phase 05: API/UI Explainability
+
+## Purpose
+
+Expose evidence-backed smart-flow outputs through API, websocket, and UI surfaces that make evidence quality, confidence, conviction, alternatives, and abstention understandable.
+
+## Why this phase comes now
+
+The presentation layer should wait until contracts, evidence, scoring, and replay evaluation are stable. Otherwise the UI will harden old overconfident language or teach users to trust unvalidated outputs.
+
+## Dependencies on earlier phases
+
+- `islandflow-zxh.1` - Smart-flow contracts and vocabulary
+- `islandflow-zxh.2` - Evidence clustering and features
+- `islandflow-zxh.3` - Hypothesis scoring and abstention
+- `islandflow-zxh.4` - Replay evaluation and golden tests
+- `islandflow-259.5` - Synthetic demo and load profiles
+
+## Likely files/modules touched
+
+- `services/api/src/`
+- Websocket payload types and channel names
+- `apps/web/`
+- Shared UI/domain types in `packages/types/`
+- API and UI tests
+
+## In-scope work
+
+- Add or alias API/WS surfaces for evidence, hypotheses, insights, alternatives, and abstention.
+- Keep legacy smart-money endpoints as aliases where needed, not canonical contracts.
+- Rework UI surfaces around evidence quality, confidence versus conviction, alternatives, abstention, and why-not context.
+- Ensure named deterministic demos can display stable explainability examples.
+- Keep replay/golden validation tied to changed projections.
+
+## Explicitly out-of-scope work
+
+- Rewriting scoring policy.
+- Adding new synthetic foundations.
+- Historical calibration.
+- Claiming participant identity.
+- UI copy that implies certainty where the model only has evidence-backed hypotheses.
+
+## Acceptance criteria
+
+- API/WS payloads expose evidence refs, hypotheses, insights, alternatives, abstention reasons, and version fields.
+- UI distinguishes evidence quality, confidence, conviction, and why-not signals.
+- Legacy smart-money surfaces remain compatibility aliases where required.
+- Replay/golden checks support changed projection behavior.
+- Explainability copy avoids overconfident certainty claims.
+
+## Test strategy
+
+Use API contract tests, websocket payload tests, and focused UI tests for evidence/abstention rendering. Validate with deterministic demo runs from synthetic phase 05. Manual visual review should supplement, not replace, replay/golden validation.
+
+## Risks / design traps
+
+- UI can accidentally reintroduce "smart money" certainty.
+- API aliases can become de facto canonical if not documented.
+- Too many fields without hierarchy will make explainability harder to scan.
+- Building UI before replay validation can make demos persuasive but untrustworthy.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/smart-money/05-api-ui-explainability.md for Beads issue islandflow-zxh.5. Use split issues islandflow-zxh.5.1 and islandflow-zxh.5.2 for PR-sized work. Expose evidence-backed API/WS/UI explainability after replay/golden validation. Do not change core scoring or add calibration.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-zxh.5` - Smart-flow phase 05: API/UI explainability
+- PR split: `islandflow-zxh.5.1` - Split smart-flow phase 05a: evidence API and websocket surfaces
+- PR split: `islandflow-zxh.5.2` - Split smart-flow phase 05b: UI explainability surfaces
--- a/docs/implementation/smart-money/99-future-calibration.md
+++ b/docs/implementation/smart-money/99-future-calibration.md
@ -0,0 +1,65 @@
+# Smart-Flow Phase 99: Future Calibration
+
+## Purpose
+
+Plan future calibration of smart-flow confidence, policy thresholds, penalties, and abstention behavior after the MVP evidence/hypothesis pipeline is working and replay-validated.
+
+## Why this phase comes now
+
+The architecture should leave room for calibration, but calibration should not block the MVP. The system first needs clean facts, evidence, hypotheses, and replayable evaluation before tuning can be meaningful.
+
+## Dependencies on earlier phases
+
+- `islandflow-zxh.5` - Smart-flow API/UI explainability
+- `islandflow-259.6` - Future synthetic historical calibration
+
+## Likely files/modules touched
+
+- Future calibration tooling in `services/compute/` or a research package
+- Policy/model version registry
+- Evaluation reports or benchmark datasets
+- Storage/query helpers for historical derived outputs
+- Documentation for metrics and calibration governance
+
+## In-scope work
+
+- Define calibration datasets and evaluation metrics.
+- Specify how confidence, conviction, penalties, abstention, and alternatives are tuned.
+- Preserve policy/model versioning and replayability.
+- Document what makes a calibration dataset acceptable.
+- Keep user-facing confidence semantics auditable.
+
+## Explicitly out-of-scope work
+
+- MVP contracts and scoring foundations.
+- API/UI explainability for the initial pipeline.
+- Treating historical calibration as proof of participant identity.
+- Using private or licensed data in committed fixtures without approval.
+
+## Acceptance criteria
+
+- Calibration remains outside the MVP blocker chain.
+- Dataset provenance, metrics, and policy versioning are documented before implementation.
+- Confidence and abstention semantics remain explainable after tuning.
+- Replay can compare calibrated policy versions without losing auditability.
+
+## Test strategy
+
+When implemented, use replayed benchmark datasets with versioned policy outputs. Track false positives, abstentions, precision-like metrics, and scenario-specific regressions. Keep calibration tests separate from the early deterministic fixture tests.
+
+## Risks / design traps
+
+- Treating calibrated confidence as objective truth.
+- Tuning to demos instead of representative market regimes.
+- Losing policy version lineage.
+- Committing restricted data or large generated benchmark artifacts.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/smart-money/99-future-calibration.md for Beads issue islandflow-zxh.6 only after the MVP smart-flow phases are complete. Define calibration datasets, metrics, policy versioning, and replay comparison. Do not make calibration a prerequisite for earlier evidence, scoring, or UI work.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-zxh.6` - Future smart-flow phase 99: calibration