69 lines
3.1 KiB
Markdown
69 lines
3.1 KiB
Markdown
# Smart-Flow Phase 02: Evidence Clustering and Features
|
|
|
|
## Purpose
|
|
|
|
Make evidence extraction, eligibility, quote/context joins, clustering, and feature construction explicit and traceable before hypothesis scoring changes.
|
|
|
|
## Why this phase comes now
|
|
|
|
Contracts alone do not change behavior. This phase gives the system a clean evidence layer so later scoring can reason from auditable facts instead of a generic feature bag or overconfident classifier labels.
|
|
|
|
## Dependencies on earlier phases
|
|
|
|
- `islandflow-zxh.1` - Smart-flow contracts and vocabulary
|
|
- `islandflow-259.2` - Synthetic manifests, fixtures, and CLI
|
|
|
|
## Likely files/modules touched
|
|
|
|
- `services/compute/src/`
|
|
- `packages/types/src/events.ts`
|
|
- `packages/storage/src/` for typed evidence storage planning or implementation
|
|
- Tests under `services/compute/tests/`
|
|
- Fixture helpers from the synthetic package
|
|
|
|
## In-scope work
|
|
|
|
- Represent direct observations, quote joins, execution context, and eligibility decisions as evidence facts.
|
|
- Build deterministic evidence clusters with traceable source refs.
|
|
- Compute feature vectors from evidence while preserving whether a value is observed, derived, or inferred.
|
|
- Carry evidence quality, stale quote, wide spread, odd lot, complex spread, and noisy context signals.
|
|
- Move toward ingest-as-normalization, not ingest-as-signal-policy.
|
|
|
|
## Explicitly out-of-scope work
|
|
|
|
- Final hypothesis score policy.
|
|
- API and UI explainability.
|
|
- Historical calibration.
|
|
- Claiming participant identity.
|
|
- Replacing all storage tables in the same PR.
|
|
|
|
## Acceptance criteria
|
|
|
|
- Eligibility decisions have explicit accept, reject, or down-weight reasons.
|
|
- Evidence clusters have deterministic keys/windows and preserve raw refs.
|
|
- Feature values trace back to evidence refs.
|
|
- Stale, wide, noisy, or ambiguous conditions can be represented without pretending to know intent.
|
|
- The phase is split into PR-sized children when implementation starts.
|
|
|
|
## Test strategy
|
|
|
|
Use deterministic fixtures from synthetic phase 02 where available. Add focused tests for quote joining, eligibility rejection, cluster key stability, feature derivation, and trace refs. Keep tests infra-free unless a later optional storage integration explicitly needs services.
|
|
|
|
## Risks / design traps
|
|
|
|
- Recreating the old `FlowPacket` as a renamed generic feature bag.
|
|
- Letting ingest services make signal-policy decisions.
|
|
- Losing evidence refs during aggregation.
|
|
- Treating cluster features as hypotheses before the scoring phase.
|
|
|
|
## Suggested future Codex implementation prompt
|
|
|
|
```text
|
|
Implement docs/implementation/smart-money/02-evidence-clustering-features.md for Beads issue islandflow-zxh.2. Use split issues islandflow-zxh.2.1 and islandflow-zxh.2.2 for PR-sized work. Focus on evidence facts, eligibility, clustering, and traceable features. Do not implement final scoring, API/UI explainability, or calibration.
|
|
```
|
|
|
|
## Matching Beads issue title/id
|
|
|
|
- `islandflow-zxh.2` - Smart-flow phase 02: evidence clustering and features
|
|
- PR split: `islandflow-zxh.2.1` - Split smart-flow phase 02a: eligibility and evidence facts
|
|
- PR split: `islandflow-zxh.2.2` - Split smart-flow phase 02b: clustering and feature vectors
|