From 82861408e47334379c3f2d9656c99b9615941ca4 Mon Sep 17 00:00:00 2001 From: dirtydishes Date: Mon, 29 Dec 2025 22:10:26 -0500 Subject: [PATCH] Track project docs Remove docs from .gitignore and add AGENTS.md, PLAN.md, CODING_STYLE.md, and RESEARCH.md to the repository. --- .gitignore | 4 - AGENTS.md | 186 +++++++++++++++++++++++++++++++++ CODING_STYLE.md | 97 ++++++++++++++++++ PLAN.md | 265 ++++++++++++++++++++++++++++++++++++++++++++++++ RESEARCH.md | 123 ++++++++++++++++++++++ 5 files changed, 671 insertions(+), 4 deletions(-) create mode 100644 AGENTS.md create mode 100644 CODING_STYLE.md create mode 100644 PLAN.md create mode 100644 RESEARCH.md diff --git a/.gitignore b/.gitignore index 89ac89b..0ac6b0d 100644 --- a/.gitignore +++ b/.gitignore @@ -10,8 +10,4 @@ dist/ coverage/ logs/ .tmp/ -AGENTS.md -PLAN.md -CODING_STYLE.md -RESEARCH.md apps/web/.next/ diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..430799b --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,186 @@ +# AGENTS.md — Execution Guardrails for Codex + +This file defines **how Codex should think, act, and prioritize** when working in this repository. +Its purpose is to keep development **focused, correct, and non-drifting**. + +If there is any conflict between speed and correctness, **correctness wins**. + +--- + +## Mission + +Build a **real-time, non-delayed options flow and off-exchange trade analysis platform** for personal use that is: + +- explainable +- deterministic +- replayable +- microstructure-correct +- low-latency +- built on **Bun** + +Codex is an **engineering executor**, not a product visionary. +Do not invent scope. Do not “improve” the plan. Implement it faithfully. + +--- + +## Non-Negotiable Constraints + +- **Bun is mandatory** + - Use Bun for runtime, package manager, scripts, and dev tooling. + - Do not introduce npm, yarn, pnpm, or Node-only assumptions. +- **TypeScript only** + - No JS-only files unless unavoidable (and document why). +- **No black-box logic** + - All classifiers must be rule-based and explainable. +- **Personal-use architecture** + - No multi-user assumptions. + - No redistribution mechanisms. +- **Deterministic pipelines** + - Live behavior must match replay behavior. + +If a change violates any of the above, **do not implement it**. + +--- + +## Source of Truth + +The authoritative documents are, in order: + +1. `PLAN.md` +2. `AGENTS.md` +3. Code already merged into `main` + +If a request contradicts `PLAN.md`, Codex must **stop and ask for clarification**. + +--- + +## Development Rules + +### 1. Never Skip the Event Layer +- All incoming market data becomes **immutable events**. +- Never compute directly off live feeds without persisting the event. +- Never add UI-only logic that bypasses persisted data. + +### 2. Separate Fact from Inference +- Raw data (`OptionPrint`, `EquityPrint`) is **fact**. +- Classifiers and dark pool signals are **inference**. +- Store and label them separately. +- Never overwrite facts with inferred labels. + +### 3. Explainability Is Required +Every classifier must: +- have a unique ID +- expose its inputs +- produce a human-readable explanation string +- link back to evidence prints + +If an alert cannot explain itself, it is invalid. + +### 4. Favor Simple, Explicit Logic +- Prefer clear thresholds over clever heuristics. +- Avoid premature ML or probabilistic tuning. +- If logic becomes complex, break it into named steps. + +This is a research system, not a trading bot. + +--- + +## Classifier Implementation Rules + +- Classifiers operate on **FlowPackets**, not raw prints. +- Each classifier: + - returns `{ confidence, direction, explanations[] }` + - contributes to alert scoring but does not decide alerts alone +- Never infer intent with certainty. +- Use language like: + - “likely” + - “suggests” + - “consistent with” +- Never use language like: + - “smart money” + - “institutional intent” + - “guaranteed” + +--- + +## Time & Market Structure Rules + +- Always join prints to NBBO using bounded time windows. +- Track and expose join quality (`nbbo_age_ms`, etc.). +- Explicitly handle: + - 0DTE + - low-liquidity contracts + - wide spreads +- If confidence is low, say so. + +--- + +## Charting Rules + +- Candles are built **server-side only**. +- Client never computes OHLC. +- Overlays must be viewport-aware and decimated. +- Performance beats decoration. + +If a chart stutters, reduce data density first—not visual quality. + +--- + +## UI Rules + +- Prefer clarity over density. +- Every alert must be clickable to evidence. +- No “magic colors” without legend or explanation. +- Motion must feel physical, not flashy. + +UI exists to **inspect**, not to impress. + +--- + +## Observability & Safety + +- Add metrics alongside new pipelines. +- Log failures explicitly. +- Never silently drop events. +- During overload: + - persistence > compute > UI (in that priority order) + +--- + +## What Codex Must NOT Do + +- Do not invent new features or markets. +- Do not introduce predictive claims. +- Do not optimize prematurely. +- Do not refactor without reason. +- Do not replace explicit logic with ML. +- Do not broaden scope beyond personal use. + +--- + +## When to Stop and Ask + +Codex must pause and ask for guidance if: +- a data provider limitation blocks implementation +- licensing or entitlement assumptions change +- a requested change conflicts with `PLAN.md` +- a design decision affects determinism or replayability + +--- + +## Definition of “Done” + +A task is done only when: +- it matches `PLAN.md` +- it compiles and runs under Bun +- it is deterministic +- it is explainable +- it is testable or replayable + +--- + +## Final Reminder + +This system is built to **understand markets**, not to mythologize them. + +If something cannot be justified by observable data and clear logic, it does not belong here. diff --git a/CODING_STYLE.md b/CODING_STYLE.md new file mode 100644 index 0000000..a9bbd45 --- /dev/null +++ b/CODING_STYLE.md @@ -0,0 +1,97 @@ +# CODING_STYLE.md — TypeScript + Bun Conventions + +This document defines **local coding conventions** for this repository. +It exists to reduce drift, improve readability, and keep AI-generated code consistent. + +This is **not** a general style guide. It encodes *project-specific preferences*. + +--- + +## Language & Runtime + +- **TypeScript only** +- **Bun runtime required** +- Target modern JS (ES2022+) +- Prefer ESM everywhere + +No Node-only APIs unless explicitly unavoidable and documented. + +--- + +## File & Module Structure + +- One logical responsibility per file. +- Avoid “god files.” +- Prefer small, composable modules over deep inheritance. + +### Naming +- Files: `kebab-case.ts` +- Types / interfaces: `PascalCase` +- Functions / variables: `camelCase` +- Constants: `SCREAMING_SNAKE_CASE` (rare) + +Examples: +- `flow-packet.ts` +- `compute-aggressor-score.ts` +- `infer-absorption.ts` + +--- + +## Types & Schemas + +- **All external data must be validated** + - Use `zod` schemas at boundaries (ingest, API). +- Internal functions may assume validated input. +- Prefer explicit types over inference when crossing module boundaries. + +Avoid: +- `any` +- implicit `unknown` without narrowing + +--- + +## Error Handling + +- Fail **loudly and explicitly**. +- Prefer throwing typed errors over silent fallbacks. +- Log errors with structured context: + - service name + - event id + - ticker / contract id (if applicable) + +Never swallow errors in ingestion or compute paths. + +--- + +## Async & Concurrency + +- Prefer async/await over promise chains. +- Streaming > batching when possible. +- Avoid unbounded concurrency. +- Backpressure must be explicit. + +Never block the event loop for UI convenience. + +--- + +## Determinism Rules + +- No time-based randomness. +- No reliance on implicit system state. +- Given the same inputs, outputs must be identical. + +Live execution and replay execution must share code paths. + +--- + +## Comments & Documentation + +- Comment **why**, not **what**. +- If logic is subtle, explain assumptions. +- Avoid speculative language in comments. + +Good: +// Join window bounded to reduce NBBO misalignment during bursts + +Bad: +// This seems to work better diff --git a/PLAN.md b/PLAN.md new file mode 100644 index 0000000..2e89202 --- /dev/null +++ b/PLAN.md @@ -0,0 +1,265 @@ +# PLAN.md — Real-Time Options Flow & Off-Exchange Analysis Platform + +## Purpose +Build a **real-time, non-delayed** market-flow analysis system for **personal use** that ingests options trades/quotes and equity prints, clusters raw activity into higher-level flow events, applies **explainable rule-first classifiers**, infers dark-pool-like behavior, and visualizes everything in a **TradingView-smooth** interface with full replay and backtesting. + +--- + +## Non-Negotiables +- **Runtime & tooling:** Bun everywhere (services, scripts, dev, CI) +- **Language:** TypeScript +- **Frontend:** Next.js + React (App Router) +- **Realtime:** WebSockets (server → client) +- **Eventing:** NATS JetStream (default) or Redpanda (Kafka-compatible) +- **Storage:** ClickHouse (authoritative event log + analytics), Redis (hot state) +- **Charting:** TradingView Lightweight Charts + custom Canvas/WebGL overlays +- **Scope:** Personal, non-delayed use only (no redistribution) + +--- + +## Guiding Principles +- **Explainability first:** every alert links to evidence and explicit logic. +- **Event-sourced:** raw and derived events are persisted and replayable. +- **Microstructure correctness:** conservative inference, explicit confidence. +- **Low latency UX:** smooth pan/zoom, minimal main-thread work. +- **Determinism:** live behavior equals replay behavior. + +--- + +## High-Level Architecture +**Sources → Ingest → Event Bus → Compute → Storage → API/WS → UI** + +- Sources: options trades/quotes (OPRA-derived via licensed source), equity trades/quotes (incl. off-exchange flags) +- Ingest services normalize and publish immutable events +- Compute clusters prints, computes rolling stats, runs classifiers, emits alerts and inferred events +- ClickHouse stores everything; Redis serves hot joins/baselines +- API/WS streams curated live data and serves historical queries +- Next.js UI renders live terminals and charts + +--- + +## Monorepo Layout (Bun workspaces) + +apps/ +web/ # Next.js UI (flow, charts, alerts) +services/ +ingest-options/ # Options feed adapters (trades + NBBO) +ingest-equities/ # Equity trades/quotes ingestion +compute/ # Clustering, stats, classifiers, inference +candles/ # Server-side candle aggregation +refdata/ # Symbols, chains, corp actions +eod-enricher/ # OI + metadata snapshots +api/ # REST + WebSocket gateway +packages/ +types/ # Shared TS types + zod schemas +ui/ # Design system + motion primitives +chart/ # Chart wrappers + overlay renderers + +--- + +## Core Event Schemas (canonical) +- `OptionPrint` `{ ts, option_contract_id, price, size, exchange, conditions }` +- `OptionNBBO` `{ ts, option_contract_id, bid, ask, bidSize, askSize }` +- `EquityPrint` `{ ts, underlying_id, price, size, exchange, offExchangeFlag }` +- `EquityQuote` `{ ts, underlying_id, bid, ask }` +- `FlowPacket` `{ id, members[], features{}, join_quality{} }` +- `ClassifierHit` `{ classifier_id, confidence, direction, explanations[] }` +- `AlertEvent` `{ score, severity, hits[], evidence_refs[] }` +- `InferredDarkEvent` `{ type, confidence, evidence_refs[] }` + +All events include `{ source_ts, ingest_ts, seq, trace_id }`. + +--- + +## Epic 1 — Repo Scaffold & Infra (Day 1) +**Build** +- Initialize Bun monorepo; Docker compose for ClickHouse, Redis, NATS. +- Shared config, logging (JSON), metrics hooks. +- Define zod schemas + TS types. + +**Acceptance** +- `bun run dev` boots infra + empty services + web shell. + +--- + +## Epic 2 — Realtime Ingestion (Days 2–4) +### Options Ingestor +- Adapter interface: connect/subscribe/onTrade/onNBBO. +- Normalize to `OptionPrint`/`OptionNBBO`; publish. + +### Equity Ingestor +- Stream `EquityPrint`/`EquityQuote`; tag off-exchange when provided. + +**Acceptance** +- Live events visible via CLI subscriber. +- Raw events persisted to ClickHouse. + +--- + +## Epic 3 — Rolling Stats & Clustering (Days 4–7) +### Rolling Stats (Redis) +- Premium/size baselines (median/MAD or mean/std). +- Intraday curves; liquidity/spread penalties. + +### Clustering +- Contract sweeps (250–2000ms windows). +- Adjacent-strike ladders. +- Multi-leg detection (spreads/straddles/rolls). + +**Acceptance** +- Deterministic `FlowPacket` emission; replayable. + +--- + +## Epic 4 — Classifiers & Alert Scoring (Days 7–10) +Implement rule-first classifiers (each returns confidence + explanation): +1. Large Bullish Call Sweep +2. Large Bearish Put Sweep +3. Large Call Sell (overwrite) +4. Large Put Sell (put write) +5. Unusual Contract Spike (z-score) +6. New Position Likely (vol/OI) +7. Closing/Unwind Likely +8. 0DTE Gamma Punch (ATM) +9. Far-Dated Conviction (60DTE+) +10. Straddle/Strangle (long/short vol) +11. Vertical Spread (debit/credit) +12. Roll Up/Down/Out +13. Multi-Strike Ladder Accumulation +14. No Follow-Through / Absorbed + +**Alert Scoring** +- Weighted score from premium, aggressor, z-scores, structure bonus minus noise. +- Throttles, dedupe, cooldowns. + +**Acceptance** +- Alerts fire with human-readable “why”; unit tests per classifier. + +--- + +## Epic 5 — Dark Pool Inference (Days 10–12) +**Inference (derived, separate from raw)** +- Absorbed blocks +- Stealth accumulation +- Distribution +- Hidden liquidity zones + +**Acceptance** +- `InferredDarkEvent` links to evidence windows and chart markers. + +--- + +## Epic 6 — API & WebSockets (Days 12–14) +- REST: queries for prints, packets, inferred events; candle ranges. +- WS: channels for live flow, alerts, equity prints, inferred events. +- Backpressure aware fan-out. + +**Acceptance** +- Stable live streaming under load. + +--- + +## Epic 7 — UI: Live Terminals & Workspaces (Days 14–18) +- Live options flow terminal (virtualized, deep filters). +- Dark pool/off-exchange tape. +- Alerts center. +- Ticker workspace (flow + chart + tape). +- Tangible UX: motion, depth, blueprint grid. + +**Acceptance** +- 60fps interaction while streaming. + +--- + +## Epic 8 — Charting (Days 18–24) +**Base** +- TradingView Lightweight Charts (candles, volume, crosshair). + +**Overlays** +- Off-exchange prints as circles (radius ~ sqrt(size)). +- Inferred events & classifier markers. +- Viewport-driven rendering; OffscreenCanvas when available. + +**Acceptance** +- TV-smooth pan/zoom; overlays stay aligned. + +--- + +## Epic 9 — Replay & Backtesting (Days 24–28) +- Replay mode re-streams from ClickHouse. +- Metrics: forward returns, hit rates, calibration. + +**Acceptance** +- Live and replay share the same pipeline. + +--- + +## ADDENDUM — Missing-but-Crucial Epics + +### Epic 10 — Reference Data, Symbology & Corporate Actions +- Underlyings, option chains, OCC adjustments, corp actions. +- Canonical IDs; symbol normalizer. + +**Acceptance** +- Splits/adjustments don’t break replay or joins. + +### Epic 11 — EOD Enrichment (OI & Metadata) +- Nightly OI snapshots; provenance tagging. + +**Acceptance** +- OI-based features reference correct snapshot date. + +### Epic 12 — Time Sync, Ordering & Join Quality +- NTP/chrony; bounded join windows; join quality scores. + +**Acceptance** +- Stable aggressor inference under replay. + +### Epic 13 — Candle Aggregation Service +- Server-built 1s/5s/1m OHLCV; Redis hot cache. + +**Acceptance** +- Candle queries <100ms for hot ranges. + +### Epic 14 — Backpressure & Load Shedding +- Bounded queues; UI sampling; DLQ; replay namespace. + +**Acceptance** +- System degrades gracefully during spikes. + +### Epic 15 — Observability +- Metrics, structured logs, tracing IDs. + +**Acceptance** +- End-to-end lag visible in one dashboard. + +### Epic 16 — Secure Personal Deployment +- Auth (single-user), TLS, rate limits; no public endpoints. + +**Acceptance** +- Anonymous access blocked; VPS-safe. + +### Epic 17 — UX State: Saved Filters, Workspaces, Hotkeys +- Presets, layouts, evidence panel. + +**Acceptance** +- One-click reproducibility of setups. + +--- + +## Milestones +- **MVP-1:** realtime options flow, clustering, 10+ classifiers, live terminal. +- **MVP-2:** off-exchange prints, inferred DP events, chart overlays, alerts. +- **MVP-2.5:** candles, observability, backpressure, auth. +- **MVP-3:** replay/backtesting, metrics dashboards. + +--- + +## Non-Goals +- No black-box predictions +- No profit guarantees +- No real-time redistribution + +## Notes +- Market data usage must comply with provider terms. +- All inference is probabilistic and labeled as such. diff --git a/RESEARCH.md b/RESEARCH.md new file mode 100644 index 0000000..dc0ba6f --- /dev/null +++ b/RESEARCH.md @@ -0,0 +1,123 @@ +# RESEARCH.md — Signal Evaluation & Backtesting Discipline + +This document defines **how research is conducted and interpreted** in this repository. + +Its purpose is to **prevent self-deception**, not to slow exploration. + +--- + +## Core Research Principles + +- **No hindsight** +- **No intent claims** +- **No performance without context** +- **No conclusions without uncertainty** + +All results are provisional unless explicitly validated. + +--- + +## Fact vs Inference + +- Raw market data = **fact** +- Classifiers, signals, and labels = **inference** + +Inference must always: +- reference its evidence +- expose confidence +- be stored separately from raw data + +--- + +## Labeling Rules + +Every evaluation must specify: +- Time horizon (e.g. 5m, 15m, 60m) +- Metric (return, vol expansion, MAE/MFE, etc.) +- Directional vs volatility outcome +- Entry definition (event time, next tick, next candle) + +No implicit labels. Ever. + +--- + +## Backtesting Constraints + +- No lookahead bias +- No survivorship bias +- Use only data available **at the event timestamp** +- OI snapshots must match the event date + +Replay pipelines must mirror live pipelines exactly. + +--- + +## Evaluation Metrics (minimum set) + +At least one of: +- Precision / recall +- Hit rate vs baseline +- Forward return distribution +- Vol realized vs implied +- Calibration (probability vs outcome) + +Single “win rate” numbers are insufficient. + +--- + +## Regime Awareness + +Results must be contextualized by: +- Market regime (trend / chop / high vol) +- Time of day +- DTE bucket +- Liquidity conditions + +If a signal only works sometimes, that’s still information. + +--- + +## Threshold Tuning Rules + +- Thresholds may be tuned **only** on a defined training window. +- Validation must occur on disjoint data. +- Never tune on the same period you report. + +Document when thresholds change and why. + +--- + +## Language Discipline + +Allowed: +- “suggests” +- “consistent with” +- “correlated with” +- “higher likelihood” + +Disallowed: +- “smart money” +- “institutional intent” +- “guaranteed” +- “predicts” + +--- + +## Recording Results + +For each classifier or hypothesis, record: +- What was tested +- What failed +- What partially worked +- What conditions mattered + +Failed ideas are assets. Keep them. + +--- + +## Final Reminder + +This system is for **understanding behavior**, not for proving superiority. + +If a result cannot survive replay, uncertainty, and explanation, +it does not belong in production logic.