diff --git a/README.md b/README.md index 5608f21..4551594 100644 --- a/README.md +++ b/README.md @@ -1,281 +1,166 @@ # Real-Time Options Flow & Off-Exchange Analysis -This repository contains a real-time market-flow analysis platform focused on **options flow**, **off-exchange equity trades**, and **inferred institutional behavior**, built for low-latency, explainable analysis rather than black-box signals. +This repository contains a Bun + TypeScript monorepo for a personal-use, event-sourced market microstructure research platform focused on: -The system ingests real-time options trades/quotes and equity prints, clusters raw activity into higher-level flow events (sweeps, spreads, rolls, ladders), applies rule-first classifiers, and visualizes the results through a high-performance, TradingView-smooth interface with full replay and backtesting support. +- options prints + NBBO, +- off-exchange equity prints, +- explainable rule-based flow classification, +- deterministic replay, +- evidence-linked UI inspection. -## CURRENT STATE (Plan Progress) +## Current Implementation Status -Plan progress (rough): [#####-----] +Implemented now: -Done now (in repo): -- Bun monorepo + infra docker compose (ClickHouse, Redis, NATS JetStream) -- Shared event schemas + logging + config helpers -- Synthetic options/equity prints (full S&P 500) published to NATS and persisted to ClickHouse -- Deterministic option FlowPacket clustering (time window) + persistence -- Rolling stats in Redis (premium/size/spread) with z-score features on FlowPackets -- FlowPacket structure tags (vertical/ladder/straddle/strangle) for multi-leg bursts -- Aggressor mix features (NBBO placement ratios) on FlowPackets -- Rule-first classifiers + alert scoring with ClickHouse persistence + WS/REST endpoints -- Structure packet emission with full constituent evidence + roll metadata -- Roll classifier surfaced from detected multi-leg roll structures -- API: REST for prints/flow packets/classifier hits/alerts, WS for live options/equities/flow/alerts/hits, replay endpoints -- API: equities prints range query endpoint for chart overlays and drill-down -- UI: live tapes for options/equities/flow + replay toggle + pause controls + replay time/completion -- UI: alerts + classifier hits panels, ticker filter, evidence drawer, severity strip -- UI chart overlays for off-exchange equity prints + classifier/dark markers with linked evidence drawer behavior -- Databento historical replay adapter (options) with symbol mapping -- Alpaca options adapter (dev-only, bounded contract list) -- Alpaca equities adapter (stocks trades/quotes via WS) -- IBKR options adapter (single-underlying bridge via `ib_insync`) -- Dark-pool-style inference (absorbed blocks, stealth accumulation, distribution) with WS/REST surfaces and UI list -- Testing-mode throttling for ingest to reduce CPU during local dev -- Alert scoring calibration updates for confidence/coverage-aware severity +- Bun workspaces with shared packages for schemas, bus, config, observability, and ClickHouse access. +- Infra orchestration via Docker Compose (NATS JetStream, ClickHouse, Redis). +- Options ingest service with adapters: + - synthetic stream, + - Alpaca options (dev-focused, bounded contracts), + - IBKR bridge (Python sidecar), + - Databento historical replay adapter (Python sidecar). +- Equities ingest service with adapters: + - synthetic stream, + - Alpaca equities trades/quotes. +- Compute service: + - deterministic option print clustering into `FlowPacket`s, + - NBBO join quality features and aggressor-mix metrics, + - rolling baselines in Redis, + - structure summarization and structure packet emission, + - rule-based classifiers + confidence-scored alert events, + - dark-style inferred events from equity prints/quotes, + - equity print-to-quote join events. +- Candles service: + - server-side equity candle aggregation, + - ClickHouse persistence, + - optional Redis hot cache, + - NATS publication. +- Replay service: + - deterministic republishing from ClickHouse to NATS, + - multi-stream merge with stable tie-break ordering, + - speed/start/end controls. +- API service: + - REST endpoints for recent + cursor pagination, + - REST range endpoints for chart windows, + - REST replay-oriented endpoints, + - WebSocket channels for options, NBBO, equities, quotes, joins, flow, classifier hits, alerts, inferred dark, and candles. +- Next.js web app: + - live tape/workspace views, + - replay controls and status, + - signals and chart-focused routes, + - evidence-centric terminal UI. +- Refdata + EOD enricher service entrypoints are present but currently scaffolds (lifecycle/logging only). -In progress / blocked: -- Production-grade licensed live data feeds (beyond current dev/test bridges) -- Advanced clustering (spreads/rolls beyond basic structure tags) -- Expanded chart overlays and annotation density controls +Planned / not yet complete: -Not started: -- Reference data/corporate action enrichment -- Auth / secure deployment +- production-grade licensed feed integrations and entitlement workflow, +- richer refdata/corp-action enrichment, +- secure deployment/auth hardening, +- deeper structure + calibration workflows from `PLAN.md`. ## Core Principles -- **Explainability first** — every alert and signal is backed by observable data and explicit logic. -- **Event-sourced architecture** — all raw and derived events are persisted and replayable. -- **Market microstructure correctness** — conservative handling of aggressor inference, OI, and off-exchange prints. -- **Low-latency, tangible UX** — smooth real-time interaction that feels like an instrument panel, not a spreadsheet. +- **Explainability first** — inferred outputs are evidence-backed and human-readable. +- **Event sourcing** — raw and derived events persist to support replay. +- **Determinism** — replay behavior tracks live pipeline logic. +- **Microstructure awareness** — bounded joins, confidence scoring, and explicit uncertainty. +- **Bun-first tooling** — runtime/package/scripts all use Bun. -## Current Capabilities +## Monorepo Layout -- Synthetic options/equity prints with deterministic sequencing across the S&P 500 -- Ingest adapter seam (env-selected; options default `synthetic`, equities: `synthetic` or `alpaca`) -- Raw event persistence in ClickHouse + streaming via NATS JetStream -- Deterministic option FlowPacket clustering (time-window) -- Rolling stats baselines in Redis with z-score features on FlowPackets -- Basic multi-leg structure tagging on FlowPackets -- Aggressor mix features from NBBO placement on FlowPackets -- Classifiers + alert scoring (rule-first) with WS/REST endpoints -- Structure packet emission with roll-aware metadata and evidence lists -- Roll classifier (rule-based, explainable) emitted from structure packets -- API gateway with REST, WS, and replay endpoints -- Equities prints range REST endpoint for chart/time-window inspection -- Server-built equity candles (service + REST/WS surfaces) -- UI tapes for options/equities/flow packets + alerts/hits with live/replay toggle and pause controls -- Chart overlays for off-exchange prints, classifier markers, and dark-pool markers with evidence linking -- Alpaca options adapter (dev-only) with bounded contract selection -- IBKR options adapter (single-underlying bridge via Python sidecar) -- Databento historical replay adapter (options, Python sidecar) -- Dark-pool-style inference (absorbed blocks, stealth accumulation, distribution) with evidence links and replay - -## Planned Capabilities (from PLAN.md) - -- Real-time licensed market data ingestors (options + equities) -- Candle aggregation + chart overlays -- Replay/backtesting metrics and calibration -- Reference data, symbology, and corporate-action handling - -## Tech Stack - -- **Runtime & tooling:** Bun -- **Language:** TypeScript -- **Frontend:** Next.js + React -- **Realtime:** WebSockets -- **Event streaming:** NATS JetStream or Redpanda -- **Storage:** ClickHouse, Redis -- **Charting:** TradingView Lightweight Charts + custom canvas/WebGL overlays - -## Repository Structure - -apps/ -web/ -services/ -ingest-options/ -ingest-equities/ -compute/ -api/ -packages/ -types/ -ui/ -chart/ +- `apps/web` — Next.js UI shell/routes. +- `services/ingest-options` — options print/NBBO ingest adapters. +- `services/ingest-equities` — equity print/quote ingest adapters. +- `services/compute` — clustering, structures, classifiers, alerts, inferred dark. +- `services/candles` — server-side candle aggregation + cache. +- `services/replay` — ClickHouse → NATS replay streamer. +- `services/api` — REST + WebSocket gateway. +- `services/refdata` — scaffold service. +- `services/eod-enricher` — scaffold service. +- `packages/types` — shared event schemas/types. +- `packages/storage` — ClickHouse tables/queries. +- `packages/bus` — NATS/JetStream helpers. +- `packages/config` — env parsing. +- `packages/observability` — logger + metrics facade. ## Build and Run Install dependencies: + - `bun install` -Start infra: +Start infrastructure only: + - `docker compose up -d` Create env file: -- Copy `.env.example` to `.env` and fill in the API keys you plan to use. -Start everything (infra + services + web): +- copy `.env.example` to `.env` and set provider credentials as needed. + +Start infra + all services + web: + - `bun run dev` -Run just the web app (fixed to port 3000): -- `bun run dev:web` +Start services only (assumes infra is already running): -Run just the API: -- `bun --cwd services/api run dev` +- `bun run dev:services` + +Start web only: + +- `bun run dev:web` ## Environment Configuration -All runtime configuration is driven by `.env`. Start by copying `.env.example` and edit the values you need. Defaults below match `.env.example` unless otherwise noted. +All runtime configuration comes from `.env`. ### Core infrastructure -These define how services connect to the event bus and storage backends. Documentation links are provided for convenience. +- `NATS_URL` (default `nats://127.0.0.1:4222`) +- `CLICKHOUSE_URL` (default `http://127.0.0.1:8123`) +- `CLICKHOUSE_DATABASE` (default `default`) +- `REDIS_URL` (default `redis://127.0.0.1:6379`) -- `NATS_URL` (default `nats://localhost:4222`) — NATS JetStream endpoint. See [NATS](https://nats.io/) and [JetStream](https://docs.nats.io/nats-concepts/jetstream). -- `CLICKHOUSE_URL` (default `http://localhost:8123`) — ClickHouse HTTP endpoint. See [ClickHouse](https://clickhouse.com/). -- `CLICKHOUSE_DATABASE` (default `default`) — ClickHouse database name. -- `REDIS_URL` (default `redis://localhost:6379`) — Redis endpoint for rolling stats. See [Redis](https://redis.io/). +### Ingest adapter selection -### Adapter selection +- `OPTIONS_INGEST_ADAPTER` (`synthetic` | `alpaca` | `ibkr` | `databento`) +- `EQUITIES_INGEST_ADAPTER` (`synthetic` | `alpaca`) +- `EMIT_INTERVAL_MS` (synthetic emit cadence) -- `OPTIONS_INGEST_ADAPTER` (default `synthetic`) — options ingest adapter: `synthetic`, `alpaca`, `ibkr`, `databento`. -- `EQUITIES_INGEST_ADAPTER` (default `synthetic`) — equities ingest adapter: `synthetic`, `alpaca`. -- `EMIT_INTERVAL_MS` (default `1000`) — synthetic equities emit cadence. +### Options adapter settings -### Alpaca options adapter (dev-only) +- Alpaca: `ALPACA_KEY_ID`, `ALPACA_SECRET_KEY`, `ALPACA_REST_URL`, `ALPACA_WS_BASE_URL`, `ALPACA_FEED`, `ALPACA_UNDERLYINGS`, `ALPACA_STRIKES_PER_SIDE`, `ALPACA_MAX_DTE_DAYS`, `ALPACA_MONEYNESS_PCT`, `ALPACA_MONEYNESS_FALLBACK_PCT`, `ALPACA_MAX_QUOTES` +- Databento: `DATABENTO_API_KEY`, `DATABENTO_DATASET`, `DATABENTO_SCHEMA`, `DATABENTO_NBBO_SCHEMA`, `DATABENTO_START`, `DATABENTO_END`, `DATABENTO_SYMBOLS`, `DATABENTO_STYPE_IN`, `DATABENTO_STYPE_OUT`, `DATABENTO_LIMIT`, `DATABENTO_PRICE_SCALE`, `DATABENTO_PYTHON_BIN` +- IBKR: `IBKR_HOST`, `IBKR_PORT`, `IBKR_CLIENT_ID`, `IBKR_SYMBOL`, `IBKR_EXPIRY`, `IBKR_STRIKE`, `IBKR_RIGHT`, `IBKR_EXCHANGE`, `IBKR_CURRENCY`, `IBKR_PYTHON_BIN` -Provider links: [Alpaca](https://alpaca.markets/), [Alpaca Market Data API](https://alpaca.markets/docs/api-references/market-data-api/). +### Equities adapter settings -- `ALPACA_KEY_ID`, `ALPACA_SECRET_KEY` — credentials. -- `ALPACA_REST_URL` (default `https://data.alpaca.markets`) — REST endpoint. -- `ALPACA_WS_BASE_URL` (default `wss://stream.data.alpaca.markets/v1beta1`) — streaming endpoint. -- `ALPACA_FEED` (default `indicative`) — use `opra` when you have a subscription. -- `ALPACA_EQUITIES_FEED` (default `iex`) — equities feed: `iex` (free) or `sip` (paid). -- `ALPACA_UNDERLYINGS` (default `SPY,NVDA,AAPL`) — comma-separated list of symbols. -- `ALPACA_STRIKES_PER_SIDE` (default `8`) — strikes per side around ATM. -- `ALPACA_MAX_DTE_DAYS` (default `30`) — expiry horizon. -- `ALPACA_MONEYNESS_PCT` (default `0.06`) — ATM band for strike selection. -- `ALPACA_MONEYNESS_FALLBACK_PCT` (default `0.1`) — fallback band if strikes are sparse. -- `ALPACA_MAX_QUOTES` (default `200`) — subscription size guardrail. +- `ALPACA_EQUITIES_FEED` (`iex` or `sip`) -### Databento historical replay adapter +### Compute / classifiers / inference -Provider links: [Databento](https://databento.com/), [Databento API](https://databento.com/docs/api-reference). +- Delivery and windowing: `COMPUTE_DELIVER_POLICY`, `COMPUTE_CONSUMER_RESET`, `NBBO_MAX_AGE_MS`, `ROLLING_WINDOW_SIZE`, `ROLLING_TTL_SEC` +- Classifiers: `CLASSIFIER_SWEEP_MIN_PREMIUM`, `CLASSIFIER_SWEEP_MIN_COUNT`, `CLASSIFIER_SWEEP_MIN_PREMIUM_Z`, `CLASSIFIER_SPIKE_MIN_PREMIUM`, `CLASSIFIER_SPIKE_MIN_SIZE`, `CLASSIFIER_SPIKE_MIN_PREMIUM_Z`, `CLASSIFIER_SPIKE_MIN_SIZE_Z`, `CLASSIFIER_Z_MIN_SAMPLES`, `CLASSIFIER_MIN_NBBO_COVERAGE`, `CLASSIFIER_MIN_AGGRESSOR_RATIO`, `CLASSIFIER_0DTE_MAX_ATM_PCT`, `CLASSIFIER_0DTE_MIN_PREMIUM`, `CLASSIFIER_0DTE_MIN_SIZE` +- Dark inference: `EQUITY_QUOTE_MAX_AGE_MS`, `DARK_INFER_WINDOW_MS`, `DARK_INFER_COOLDOWN_MS`, `DARK_INFER_MIN_BLOCK_SIZE`, `DARK_INFER_MIN_ACCUM_SIZE`, `DARK_INFER_MIN_ACCUM_COUNT`, `DARK_INFER_MIN_PRINT_SIZE`, `DARK_INFER_MAX_EVIDENCE`, `DARK_INFER_MAX_SPREAD_PCT` -- `DATABENTO_API_KEY` — API key. -- `DATABENTO_DATASET` (default `OPRA.PILLAR`) — dataset. -- `DATABENTO_SCHEMA` (default `trades`) — schema. -- `DATABENTO_START` — ISO date/time start for replay. -- `DATABENTO_END` — ISO date/time end (optional). -- `DATABENTO_SYMBOLS` (default `SPY.OPT`) — comma list or dataset symbols. -- `DATABENTO_STYPE_IN` (default `parent`) — input symbology type. -- `DATABENTO_STYPE_OUT` (default `instrument_id`) — output symbology type. -- `DATABENTO_LIMIT` (default `0`) — record cap (0 means no cap). -- `DATABENTO_PRICE_SCALE` (default `1`) — divide raw price by this value. -- `DATABENTO_PYTHON_BIN` (default `py/.venv/bin/python`) — Python executable for replay sidecar. +### Candles -### IBKR options adapter (Python sidecar) +- `CANDLE_INTERVALS_MS`, `CANDLE_MAX_LATE_MS`, `CANDLE_CACHE_LIMIT`, `CANDLE_DELIVER_POLICY`, `CANDLE_CONSUMER_RESET` -Provider links: [Interactive Brokers](https://www.interactivebrokers.com/), [IBKR API docs](https://interactivebrokers.github.io/). +### API -- `IBKR_HOST` (default `127.0.0.1`) — TWS/Gateway host. -- `IBKR_PORT` (default `7497`) — TWS/Gateway port. -- `IBKR_CLIENT_ID` (default `0`) — API client ID. -- `IBKR_SYMBOL` (default `SPY`) — underlying symbol. -- `IBKR_EXPIRY` (default `20250117`) — expiry in `YYYYMMDD`. -- `IBKR_STRIKE` (default `450`) — strike price. -- `IBKR_RIGHT` (default `C`) — option right (`C` or `P`). -- `IBKR_EXCHANGE` (default `SMART`) — exchange route. -- `IBKR_CURRENCY` (default `USD`) — currency. -- `IBKR_PYTHON_BIN` (default `python3`) — Python executable for sidecar. - -### Compute + market-structure tuning - -- `COMPUTE_DELIVER_POLICY` (default `new`) — consumer start behavior (`new` or `all`). -- `COMPUTE_CONSUMER_RESET` (default `false`) — force consumer reset (skip backlog). -- `NBBO_MAX_AGE_MS` (default `1000`) — max allowed NBBO age for joins. -- `NEXT_PUBLIC_NBBO_MAX_AGE_MS` (default `1000`) — UI-visible NBBO age for display gating. -- `ROLLING_WINDOW_SIZE` (default `50`) — rolling stats window length. -- `ROLLING_TTL_SEC` (default `86400`) — rolling stats TTL in seconds. - -### Classifier thresholds - -- `CLASSIFIER_SWEEP_MIN_PREMIUM` (default `40000`) — absolute sweep premium floor. -- `CLASSIFIER_SWEEP_MIN_COUNT` (default `3`) — minimum leg count for sweeps. -- `CLASSIFIER_SWEEP_MIN_PREMIUM_Z` (default `2`) — sweep premium z-score threshold. -- `CLASSIFIER_SPIKE_MIN_PREMIUM` (default `20000`) — absolute spike premium floor. -- `CLASSIFIER_SPIKE_MIN_SIZE` (default `400`) — absolute spike size floor. -- `CLASSIFIER_SPIKE_MIN_PREMIUM_Z` (default `2.5`) — spike premium z-score threshold. -- `CLASSIFIER_SPIKE_MIN_SIZE_Z` (default `2`) — spike size z-score threshold. -- `CLASSIFIER_Z_MIN_SAMPLES` (default `12`) — minimum samples before z-scores apply. -- `CLASSIFIER_MIN_NBBO_COVERAGE` (default `0.5`) — NBBO coverage ratio gate. -- `CLASSIFIER_MIN_AGGRESSOR_RATIO` (default `0.55`) — aggressor ratio gate. -- `CLASSIFIER_0DTE_MAX_ATM_PCT` (default `0.01`) — max ATM distance as pct of underlying for 0DTE gamma punch. -- `CLASSIFIER_0DTE_MIN_PREMIUM` (default `20000`) — 0DTE gamma punch premium floor. -- `CLASSIFIER_0DTE_MIN_SIZE` (default `400`) — 0DTE gamma punch size floor. +- `API_PORT`, `REST_DEFAULT_LIMIT` ### Replay service -- `REPLAY_ENABLED` (default `false`) — start the replay streamer when running `bun run dev`. -- `REPLAY_STREAMS` (default `options,nbbo,equities,equity-quotes`) — comma list of streams to re-publish. -- `REPLAY_START_TS` (default `0`) — start timestamp in ms since epoch (0 means beginning). -- `REPLAY_END_TS` (default `0`) — end timestamp in ms since epoch (0 means no end). -- `REPLAY_SPEED` (default `1`) — playback speed (1 = real-time, 2 = 2x, 0 = as fast as possible). -- `REPLAY_BATCH_SIZE` (default `200`) — batch size per ClickHouse fetch. -- `REPLAY_LOG_EVERY` (default `1000`) — log progress every N events. +- `REPLAY_ENABLED`, `REPLAY_STREAMS`, `REPLAY_START_TS`, `REPLAY_END_TS`, `REPLAY_SPEED`, `REPLAY_BATCH_SIZE`, `REPLAY_LOG_EVERY` -### Testing + throttling +### Testing-mode throttling -- `TESTING_MODE` (default `false`) — enable ingest throttling for local dev. -- `TESTING_THROTTLE_MS` (default `200`) — minimum spacing between emitted prints. +- `TESTING_MODE` +- `TESTING_THROTTLE_MS` -Testing mode (throttles ingest to reduce CPU): -- `TESTING_MODE=true` enables throttling -- `TESTING_THROTTLE_MS=200` minimum spacing between emitted prints (per ingest service) +## Quick Notes -IBKR adapter (options, via Python `ib_insync`): -- Install Python deps: `python3 -m pip install -r services/ingest-options/py/requirements.txt` -- Set `OPTIONS_INGEST_ADAPTER=ibkr` and configure: - - `IBKR_HOST`, `IBKR_PORT`, `IBKR_CLIENT_ID` - - `IBKR_SYMBOL`, `IBKR_EXPIRY` (YYYYMMDD), `IBKR_STRIKE`, `IBKR_RIGHT` - - Optional: `IBKR_EXCHANGE` (default `SMART`), `IBKR_CURRENCY` (default `USD`), `IBKR_PYTHON_BIN` - -Alpaca adapter (options, dev-only bridge): -- Set `OPTIONS_INGEST_ADAPTER=alpaca` and configure: - - `ALPACA_KEY_ID`, `ALPACA_SECRET_KEY` - - `ALPACA_UNDERLYINGS` (comma-separated, default `SPY,NVDA,AAPL`) - - Optional: `ALPACA_FEED` (`indicative` default, `opra` with subscription) - - Optional: `ALPACA_MAX_QUOTES` (default `200`), `ALPACA_REST_URL`, `ALPACA_WS_BASE_URL` - - Optional selection tuning: `ALPACA_STRIKES_PER_SIDE` (default `8`), `ALPACA_MAX_DTE_DAYS` (default `30`), - `ALPACA_MONEYNESS_PCT` (default `0.06`), `ALPACA_MONEYNESS_FALLBACK_PCT` (default `0.10`) - -Alpaca selection policy (dev-only, deterministic): -- Pick nearest weekly and nearest monthly expiries within 30 DTE (fallback to earliest expiries if missing) -- For each expiry, select 8 strikes per side closest to ATM within ±6% (fallback to ±10% if needed) -- Subscriptions are built once at startup to keep the stream bounded and repeatable - -Databento historical replay adapter (options, via Python `databento`): -- Install Python deps: `python3 -m pip install -r services/ingest-options/py/requirements.txt` -- Set `OPTIONS_INGEST_ADAPTER=databento` and configure: - - `DATABENTO_API_KEY`, `DATABENTO_START` (ISO date/time) - - Optional: `DATABENTO_END`, `DATABENTO_DATASET` (default `OPRA.PILLAR`), `DATABENTO_SCHEMA` (default `trades`) - - Optional: `DATABENTO_SYMBOLS` (`ALL` or comma list), `DATABENTO_STYPE_IN`/`DATABENTO_STYPE_OUT` (default `raw_symbol`) - - Optional: `DATABENTO_LIMIT` (record cap), `DATABENTO_PRICE_SCALE` (divide raw price), `DATABENTO_PYTHON_BIN` -- This adapter replays historical data only; live capture will be added later. - -Run tests: -- `bun test` - -## Status - -Active build for personal, non-delayed analytical use. Multi-user access and redistribution are intentionally out of scope. - -## Non-Goals - -- No black-box AI predictions -- No profit guarantees -- No real-time data redistribution -- No guessing at intent without evidence - -## License / Usage - -For research and personal analytical use. -Market data usage is subject to the terms of the data providers. +- Python dependencies are required only for IBKR/Databento sidecars (`services/ingest-options/py/requirements.txt`). +- Candle construction is server-side; the client consumes prebuilt OHLC events. +- This repository is for personal, non-redistributed usage.