plan synthetic and smart-flow phases

2026-06-16 13:46:08 -04:00 · 2026-06-16 13:46:08 -04:00 · eaa22de302
commit eaa22de302
parent d1fac6c7ec
19 changed files with 1198 additions and 1 deletions
--- a/docs/implementation/smart-money/99-future-calibration.md
+++ b/docs/implementation/smart-money/99-future-calibration.md
@ -0,0 +1,65 @@
+# Smart-Flow Phase 99: Future Calibration
+
+## Purpose
+
+Plan future calibration of smart-flow confidence, policy thresholds, penalties, and abstention behavior after the MVP evidence/hypothesis pipeline is working and replay-validated.
+
+## Why this phase comes now
+
+The architecture should leave room for calibration, but calibration should not block the MVP. The system first needs clean facts, evidence, hypotheses, and replayable evaluation before tuning can be meaningful.
+
+## Dependencies on earlier phases
+
+- `islandflow-zxh.5` - Smart-flow API/UI explainability
+- `islandflow-259.6` - Future synthetic historical calibration
+
+## Likely files/modules touched
+
+- Future calibration tooling in `services/compute/` or a research package
+- Policy/model version registry
+- Evaluation reports or benchmark datasets
+- Storage/query helpers for historical derived outputs
+- Documentation for metrics and calibration governance
+
+## In-scope work
+
+- Define calibration datasets and evaluation metrics.
+- Specify how confidence, conviction, penalties, abstention, and alternatives are tuned.
+- Preserve policy/model versioning and replayability.
+- Document what makes a calibration dataset acceptable.
+- Keep user-facing confidence semantics auditable.
+
+## Explicitly out-of-scope work
+
+- MVP contracts and scoring foundations.
+- API/UI explainability for the initial pipeline.
+- Treating historical calibration as proof of participant identity.
+- Using private or licensed data in committed fixtures without approval.
+
+## Acceptance criteria
+
+- Calibration remains outside the MVP blocker chain.
+- Dataset provenance, metrics, and policy versioning are documented before implementation.
+- Confidence and abstention semantics remain explainable after tuning.
+- Replay can compare calibrated policy versions without losing auditability.
+
+## Test strategy
+
+When implemented, use replayed benchmark datasets with versioned policy outputs. Track false positives, abstentions, precision-like metrics, and scenario-specific regressions. Keep calibration tests separate from the early deterministic fixture tests.
+
+## Risks / design traps
+
+- Treating calibrated confidence as objective truth.
+- Tuning to demos instead of representative market regimes.
+- Losing policy version lineage.
+- Committing restricted data or large generated benchmark artifacts.
+
+## Suggested future Codex implementation prompt
+
+```text
+Implement docs/implementation/smart-money/99-future-calibration.md for Beads issue islandflow-zxh.6 only after the MVP smart-flow phases are complete. Define calibration datasets, metrics, policy versioning, and replay comparison. Do not make calibration a prerequisite for earlier evidence, scoring, or UI work.
+```
+
+## Matching Beads issue title/id
+
+- `islandflow-zxh.6` - Future smart-flow phase 99: calibration