82 lines
3.6 KiB
Markdown
82 lines
3.6 KiB
Markdown
# Smart-Flow Phase 99: Future Calibration
|
|
|
|
## Purpose
|
|
|
|
Plan future calibration of smart-flow confidence, policy thresholds, penalties, and abstention behavior after the MVP evidence/hypothesis pipeline is working and replay-validated.
|
|
|
|
## Why this phase comes now
|
|
|
|
The architecture should leave room for calibration, but calibration should not block the MVP. The system first needs clean facts, evidence, hypotheses, and replayable evaluation before tuning can be meaningful.
|
|
|
|
## Source documents
|
|
|
|
- Architecture plan: [`docs/plans/smart-flow-architecture-review.md`](../../plans/smart-flow-architecture-review.md)
|
|
- Research report: [`docs/research-docs/smart-flow-market-mechanics.md`](../../research-docs/smart-flow-market-mechanics.md)
|
|
|
|
These documents are rationale, not added scope. This future phase is the place to turn research ideas into scoped calibration work after MVP.
|
|
|
|
## Research basis
|
|
|
|
- Historical validation should be time-of-day aware and avoid lookahead bias.
|
|
- Baselines for "unusual" should account for ticker, tenor bucket, regime, and event-day exclusions.
|
|
- Confidence, penalties, abstention, and alternatives need versioned policy outputs so calibration stays auditable.
|
|
|
|
## Deferred research ideas
|
|
|
|
- ML scoring, learned calibration, richer catalyst feeds, and large historical benchmark suites require separate future Beads scope.
|
|
|
|
## Dependencies on earlier phases
|
|
|
|
- `islandflow-zxh.5` - Smart-flow API/UI explainability
|
|
- `islandflow-259.6` - Future synthetic historical calibration
|
|
|
|
## Likely files/modules touched
|
|
|
|
- Future calibration tooling in `services/compute/` or a research package
|
|
- Policy/model version registry
|
|
- Evaluation reports or benchmark datasets
|
|
- Storage/query helpers for historical derived outputs
|
|
- Documentation for metrics and calibration governance
|
|
|
|
## In-scope work
|
|
|
|
- Define calibration datasets and evaluation metrics.
|
|
- Specify how confidence, conviction, penalties, abstention, and alternatives are tuned.
|
|
- Preserve policy/model versioning and replayability.
|
|
- Document what makes a calibration dataset acceptable.
|
|
- Keep user-facing confidence semantics auditable.
|
|
|
|
## Explicitly out-of-scope work
|
|
|
|
- MVP contracts and scoring foundations.
|
|
- API/UI explainability for the initial pipeline.
|
|
- Treating historical calibration as proof of participant identity.
|
|
- Using private or licensed data in committed fixtures without approval.
|
|
|
|
## Acceptance criteria
|
|
|
|
- Calibration remains outside the MVP blocker chain.
|
|
- Dataset provenance, metrics, and policy versioning are documented before implementation.
|
|
- Confidence and abstention semantics remain explainable after tuning.
|
|
- Replay can compare calibrated policy versions without losing auditability.
|
|
|
|
## Test strategy
|
|
|
|
When implemented, use replayed benchmark datasets with versioned policy outputs. Track false positives, abstentions, precision-like metrics, and scenario-specific regressions. Keep calibration tests separate from the early deterministic fixture tests.
|
|
|
|
## Risks / design traps
|
|
|
|
- Treating calibrated confidence as objective truth.
|
|
- Tuning to demos instead of representative market regimes.
|
|
- Losing policy version lineage.
|
|
- Committing restricted data or large generated benchmark artifacts.
|
|
|
|
## Suggested future Codex implementation prompt
|
|
|
|
```text
|
|
Implement docs/implementation/smart-money/99-future-calibration.md for Beads issue islandflow-zxh.6 only after the MVP smart-flow phases are complete. Define calibration datasets, metrics, policy versioning, and replay comparison. Do not make calibration a prerequisite for earlier evidence, scoring, or UI work.
|
|
```
|
|
|
|
## Matching Beads issue title/id
|
|
|
|
- `islandflow-zxh.6` - Future smart-flow phase 99: calibration
|