dirtydishes 412c8b8af9 document research basis for phase plans

2026-06-16 13:53:54 -04:00

11 KiB

Raw Permalink Blame History

Architecture Review: Evidence-Backed Smart-Flow Detection

Summary

No source code was modified. The current architecture is not suitable as-is, but it is close enough to refactor, not rewrite. The stack is right; the domain language and pipeline shape are not.

Research direction: direct observation → inference → hypothesis, with preserved evidence and visible uncertainty.

Key code evidence: FlowPacket is a generic feature bag in events.ts, SmartMoneyEvent already has useful score/abstention fields in events.ts, compute emits smart-money events then compatibility hits/alerts in index.ts, storage keeps core hypothesis detail as JSON in smart-money-events.ts, and replay currently replays raw market streams rather than validating the whole derived pipeline in replay/index.ts.

Source Documents

Research report: docs/research-docs/smart-flow-market-mechanics.md
Research architecture review copy: docs/research-docs/smart-flow-architecture-review.md

These research documents explain the rationale. They are background, not implementation scope; execution scope lives in the Beads issue and the relevant phase document.

Area Classification

Area	Call	Architecture Review
Domain model	refactor	Good bones, wrong center. Make evidence, hypotheses, scores, and alternatives first-class.
Event taxonomy	refactor	Raw/derived split is good; `smart_money`, `dark.inferred`, and `classifier_hits` leak overconfident product language.
Service boundaries	refactor	Ingest does too much signal policy; compute is too broad. Split pipeline stages before adding more intelligence.
`FlowPacket`	refactor	Keep concept, rename/reframe as `FlowEvidenceCluster` or `FlowCandidate`. Not a product domain object.
`SmartMoneyEvent`	redesign	Replace canonical object with `FlowHypothesisEvent`; use `SmartFlowInsight` only as UI/API projection.
Classifier pipeline	redesign	Current rules mix evidence extraction, hypothesis scoring, narrative labels, and alerting. Needs staged outputs.
ClickHouse/storage	refactor	Right datastore; raw tables are decent, derived evidence/hypotheses need typed/queryable columns plus JSON sidecars.
Redis baselines/cache	refactor	Right hot-state role; wrong as hidden baseline truth. Baselines need replayable snapshots/versioning.
NATS/JetStream subjects	refactor	Right bus; subjects should express stage/version: observations, evidence, hypotheses, insights.
Replay determinism	redesign	Present but not central enough. Replay must be the acceptance gate for derived outputs.
API/WebSocket	refactor	Mechanics are good; public surface should expose evidence bundles and hypotheses, not internal legacy names.
UI evidence model	refactor	Directionally good, but still foregrounds “profile/probability” over evidence quality, alternatives, and uncertainty.
Test strategy	redesign	Unit tests are solid scaffolding; needs fixture replay, false-positive suites, calibration, and end-to-end determinism.

Direct Answers

Current suitability: no. Useful infrastructure, but not yet an evidence-backed smart-flow architecture.
SmartMoneyEvent: not a good canonical domain object. Use FlowHypothesisEvent. ParticipantHypothesisEvent implies participant identity too strongly. SmartFlowInsight should be a user-facing projection.
FlowPacket: not as named. Keep the abstraction as an internal evidence cluster, rename to FlowEvidenceCluster or FlowCandidate.
Service boundaries: not right. Ingest should normalize only; evidence quality, eligibility, clustering, hypothesis scoring, and insight projection should be separate stages.
ClickHouse/Redis/NATS roles: yes broadly. ClickHouse = authoritative event/audit store. Redis = hot cache only. NATS = transport, not truth. All three need cleaner contracts.
Replay central enough: no. It should be how every detection change proves itself.
UI uncertainty: partially. It shows evidence refs, profile ladders, abstention, and suppression, but needs confidence vs conviction, alternative explanations, evidence quality, and “why not” signals.
First-class domain objects: raw observations, execution context, quote join, eligibility decision, evidence cluster, structure hypothesis, evidence quality score, baseline snapshot, hypothesis score vector, false-positive penalty, catalyst context, flow hypothesis event, smart-flow insight, replay run.
Implementation details: Redis list layout, durable consumer names, current classifier thresholds, ClickHouse batch writer, adapter internals, legacy ClassifierHitEvent, alert severity math, UI cache mechanics.
Delete/defer: canonical “smart money” naming, real-time dark-pool certainty, standalone whale-premium alerts, trade-level open/close claims, participant identity claims, simplistic premium alert score, ingest-time signal filtering, retail_whale as a canonical profile unless reframed as attention/lottery flow.

Option A — Conservative

Summary: keep current objects and services; add evidence-quality fields, UI copy fixes, and replay tests.

Pros: fastest, lowest migration risk, preserves current endpoints and UI.

Cons: leaves misleading canonical names; makes future research harder; keeps inference tangled inside current compute flow.

Complexity: low. Migration risk: low.

Better: less overconfidence, more visible suppression, quicker validation.

Worse: domain debt remains; SmartMoneyEvent becomes harder to undo later.

Likely kept: most code in services/compute, packages/types, packages/storage, API routes, UI panes.

Likely rewritten: alert scoring, UI labels, some profile fields.

Likely deleted: almost nothing.

PR sequence:

Rename UI copy from “Smart money” to “Smart flow candidate.”
Add evidence-quality and alternative-explanation fields to existing event.
Add replay consistency tests around current outputs.
Add typed ClickHouse columns for high-value JSON fields.
Deprecate, but do not remove, legacy classifier hit display.

Option B — Refactor

Summary: keep Bun/TS, NATS, ClickHouse, Redis, API/WS, and the terminal UI, but rebuild the domain pipeline around evidence clusters and hypothesis events.

Pros: fixes the product’s epistemic spine without wasting useful infrastructure; best fit for pre-alpha.

Cons: breaking contract migration; touches types, storage, compute, API, UI, and tests.

Complexity: medium-high. Migration risk: medium.

Better: replayability, auditability, naming, evidence display, calibration, and future research velocity.

Worse: more short-term churn; old demos and endpoints need compatibility aliases.

Likely kept: raw market schemas, adapters, NATS/ClickHouse/Redis clients, live socket mechanics, virtualized UI, replay service skeleton, many feature calculations.

Likely rewritten: SmartMoneyEvent, FlowPacket, classifier pipeline, alert projection, ClickHouse derived schemas, API channel names, UI evidence drawers.

Likely deleted: canonical smart_money naming, ingest signal policy, premium-heavy alert scoring, ClassifierHitEvent as primary domain surface.

PR sequence:

Introduce FlowEvidenceCluster, FlowHypothesisEvent, SmartFlowInsight, EvidenceQuality, and version fields; keep aliases for compatibility.
Move signal eligibility out of ingest; ingest publishes normalized observations plus execution context only.
Split compute internally into evidence join → cluster/structure → hypothesis scoring → insight/alert projection.
Replace derived JSON-only storage with typed query columns for evidence quality, hypothesis scores, model version, policy version, and refs.
Add replay-run harness that recomputes derived outputs from raw streams and compares signatures.
Add /flow/evidence, /flow/hypotheses, /flow/insights plus WS equivalents; keep legacy endpoints as aliases.
Rework UI drawers/tables around evidence quality, confidence vs conviction, alternatives, abstention, and catalyst/noise context.
Add fixture suites for stale quotes, complex spreads, 0DTE/event noise, deep ITM, wide spreads, and off-exchange ambiguity.

Option C — Redesign

Summary: if starting over, build an event-sourced evidence engine with raw observations as the only source of truth and every derived artifact generated by versioned, replayable policies.

Pros: cleanest long-term architecture; strongest research discipline; easiest calibration/backtesting story.

Cons: slowest; overkill before product fit; discards too much working terminal and streaming infrastructure.

Complexity: very high. Migration risk: high.

Better: clean contracts, model versioning, deterministic replay, research-grade evidence lineage.

Worse: delivery speed, continuity, and working UI velocity.

Likely kept: market adapters, some schemas, ClickHouse client, NATS helpers, UI visual direction, selected tests.

Likely rewritten: almost all compute, storage schemas, API contracts, replay, UI data model.

Likely deleted: FlowPacket, SmartMoneyEvent, ClassifierHitEvent, AlertEvent as currently shaped, current subject hierarchy, current derived tables.

PR sequence:

Define new canonical event taxonomy and versioned policy registry.
Build raw observation lake and deterministic replay runner first.
Build evidence extraction and quote/condition eligibility services.
Build cluster and structure hypothesis services.
Build hypothesis scoring and calibration services.
Build insight projection API.
Rebuild terminal against new evidence/hypothesis contracts.
Backfill or discard old derived data.

Recommendation

Choose Option B.

Bluntly: Option A is too timid for a pre-alpha product whose current names already fight the research. Option C is intellectually clean but wastes too much working infrastructure. Option B keeps the stack and terminal momentum while fixing the core mistake: treating “smart money” as a thing the system emits, instead of treating smart flow as a cautious, evidence-backed hypothesis with alternatives.

The first implementation move should be the contract/naming PR: introduce FlowHypothesisEvent and FlowEvidenceCluster with compatibility aliases, then make replay the gate before touching more classifier logic.

11 KiB Raw Permalink Blame History Unescape Escape

Architecture Review: Evidence-Backed Smart-Flow Detection

Summary

Source Documents

Area Classification

Direct Answers

Option A — Conservative

Option B — Refactor

Option C — Redesign

Recommendation

11 KiB

Raw Permalink Blame History