islandflow/docs/implementation/smart-money/02-evidence-clustering-features.md

69 lines
3.1 KiB
Markdown

# Smart-Flow Phase 02: Evidence Clustering and Features
## Purpose
Make evidence extraction, eligibility, quote/context joins, clustering, and feature construction explicit and traceable before hypothesis scoring changes.
## Why this phase comes now
Contracts alone do not change behavior. This phase gives the system a clean evidence layer so later scoring can reason from auditable facts instead of a generic feature bag or overconfident classifier labels.
## Dependencies on earlier phases
- `islandflow-zxh.1` - Smart-flow contracts and vocabulary
- `islandflow-259.2` - Synthetic manifests, fixtures, and CLI
## Likely files/modules touched
- `services/compute/src/`
- `packages/types/src/events.ts`
- `packages/storage/src/` for typed evidence storage planning or implementation
- Tests under `services/compute/tests/`
- Fixture helpers from the synthetic package
## In-scope work
- Represent direct observations, quote joins, execution context, and eligibility decisions as evidence facts.
- Build deterministic evidence clusters with traceable source refs.
- Compute feature vectors from evidence while preserving whether a value is observed, derived, or inferred.
- Carry evidence quality, stale quote, wide spread, odd lot, complex spread, and noisy context signals.
- Move toward ingest-as-normalization, not ingest-as-signal-policy.
## Explicitly out-of-scope work
- Final hypothesis score policy.
- API and UI explainability.
- Historical calibration.
- Claiming participant identity.
- Replacing all storage tables in the same PR.
## Acceptance criteria
- Eligibility decisions have explicit accept, reject, or down-weight reasons.
- Evidence clusters have deterministic keys/windows and preserve raw refs.
- Feature values trace back to evidence refs.
- Stale, wide, noisy, or ambiguous conditions can be represented without pretending to know intent.
- The phase is split into PR-sized children when implementation starts.
## Test strategy
Use deterministic fixtures from synthetic phase 02 where available. Add focused tests for quote joining, eligibility rejection, cluster key stability, feature derivation, and trace refs. Keep tests infra-free unless a later optional storage integration explicitly needs services.
## Risks / design traps
- Recreating the old `FlowPacket` as a renamed generic feature bag.
- Letting ingest services make signal-policy decisions.
- Losing evidence refs during aggregation.
- Treating cluster features as hypotheses before the scoring phase.
## Suggested future Codex implementation prompt
```text
Implement docs/implementation/smart-money/02-evidence-clustering-features.md for Beads issue islandflow-zxh.2. Use split issues islandflow-zxh.2.1 and islandflow-zxh.2.2 for PR-sized work. Focus on evidence facts, eligibility, clustering, and traceable features. Do not implement final scoring, API/UI explainability, or calibration.
```
## Matching Beads issue title/id
- `islandflow-zxh.2` - Smart-flow phase 02: evidence clustering and features
- PR split: `islandflow-zxh.2.1` - Split smart-flow phase 02a: eligibility and evidence facts
- PR split: `islandflow-zxh.2.2` - Split smart-flow phase 02b: clustering and feature vectors