253 lines
11 KiB
Markdown
253 lines
11 KiB
Markdown
# Real-Time Options Flow & Off-Exchange Analysis
|
|
|
|
This repository contains a Bun + TypeScript monorepo for a personal-use, event-sourced market microstructure research platform focused on:
|
|
|
|
- options prints + NBBO,
|
|
- off-exchange equity prints,
|
|
- explainable rule-based flow classification,
|
|
- deterministic replay,
|
|
- evidence-linked UI inspection.
|
|
|
|
## Current Implementation Status
|
|
|
|
Implemented now:
|
|
|
|
- Bun workspaces with shared packages for schemas, bus, config, observability, and ClickHouse access.
|
|
- Infra orchestration via Docker Compose (NATS JetStream, ClickHouse, Redis).
|
|
- Options ingest service with adapters:
|
|
- synthetic stream,
|
|
- Alpaca options (dev-focused, bounded contracts),
|
|
- IBKR bridge (Python sidecar),
|
|
- Databento historical replay adapter (Python sidecar).
|
|
- Equities ingest service with adapters:
|
|
- synthetic stream,
|
|
- Alpaca equities trades/quotes.
|
|
- Compute service:
|
|
- deterministic option print clustering into `FlowPacket`s,
|
|
- NBBO join quality features and aggressor-mix metrics,
|
|
- rolling baselines in Redis,
|
|
- structure summarization and structure packet emission,
|
|
- rule-based classifiers + confidence-scored alert events,
|
|
- dark-style inferred events from equity prints/quotes,
|
|
- equity print-to-quote join events.
|
|
- Candles service:
|
|
- server-side equity candle aggregation,
|
|
- ClickHouse persistence,
|
|
- optional Redis hot cache,
|
|
- NATS publication.
|
|
- Replay service:
|
|
- deterministic republishing from ClickHouse to NATS,
|
|
- multi-stream merge with stable tie-break ordering,
|
|
- speed/start/end controls.
|
|
- API service:
|
|
- REST endpoints for recent + cursor pagination,
|
|
- REST range endpoints for chart windows,
|
|
- REST replay-oriented endpoints,
|
|
- WebSocket channels for options, NBBO, equities, quotes, joins, flow, classifier hits, alerts, inferred dark, and candles.
|
|
- Next.js web app:
|
|
- live tape/workspace views,
|
|
- replay controls and status,
|
|
- signals and chart-focused routes,
|
|
- evidence-centric terminal UI.
|
|
- Refdata + EOD enricher service entrypoints are present but currently scaffolds (lifecycle/logging only).
|
|
|
|
Planned / not yet complete:
|
|
|
|
- production-grade licensed feed integrations and entitlement workflow,
|
|
- richer refdata/corp-action enrichment,
|
|
- secure deployment/auth hardening,
|
|
- deeper structure + calibration workflows from `PLAN.md`.
|
|
|
|
## Core Principles
|
|
|
|
- **Explainability first** — inferred outputs are evidence-backed and human-readable.
|
|
- **Event sourcing** — raw and derived events persist to support replay.
|
|
- **Determinism** — replay behavior tracks live pipeline logic.
|
|
- **Microstructure awareness** — bounded joins, confidence scoring, and explicit uncertainty.
|
|
- **Bun-first tooling** — runtime/package/scripts all use Bun.
|
|
|
|
## Monorepo Layout
|
|
|
|
- `apps/web` — Next.js UI shell/routes.
|
|
- `services/ingest-options` — options print/NBBO ingest adapters.
|
|
- `services/ingest-equities` — equity print/quote ingest adapters.
|
|
- `services/compute` — clustering, structures, classifiers, alerts, inferred dark.
|
|
- `services/candles` — server-side candle aggregation + cache.
|
|
- `services/replay` — ClickHouse → NATS replay streamer.
|
|
- `services/api` — REST + WebSocket gateway.
|
|
- `services/refdata` — scaffold service.
|
|
- `services/eod-enricher` — scaffold service.
|
|
- `packages/types` — shared event schemas/types.
|
|
- `packages/storage` — ClickHouse tables/queries.
|
|
- `packages/bus` — NATS/JetStream helpers.
|
|
- `packages/config` — env parsing.
|
|
- `packages/observability` — logger + metrics facade.
|
|
|
|
## Build and Run
|
|
|
|
Install dependencies:
|
|
|
|
- `bun install`
|
|
|
|
Start infrastructure only:
|
|
|
|
- `docker compose up -d`
|
|
|
|
Create env file:
|
|
|
|
- copy `.env.example` to `.env` and set provider credentials as needed.
|
|
|
|
Start infra + all services + web:
|
|
|
|
- `bun run dev`
|
|
|
|
Start services only (assumes infra is already running):
|
|
|
|
- `bun run dev:services`
|
|
|
|
Start web only:
|
|
|
|
- `bun run dev:web`
|
|
|
|
## Environment Configuration
|
|
|
|
All runtime configuration comes from `.env`.
|
|
|
|
### Core infrastructure
|
|
|
|
- `NATS_URL` (default `nats://127.0.0.1:4222`)
|
|
- `CLICKHOUSE_URL` (default `http://127.0.0.1:8123`)
|
|
- `CLICKHOUSE_DATABASE` (default `default`)
|
|
- `REDIS_URL` (default `redis://127.0.0.1:6379`)
|
|
|
|
### Ingest adapter selection
|
|
|
|
- `OPTIONS_INGEST_ADAPTER` (`synthetic` | `alpaca` | `ibkr` | `databento`)
|
|
- `EQUITIES_INGEST_ADAPTER` (`synthetic` | `alpaca`)
|
|
- `EMIT_INTERVAL_MS` (synthetic emit cadence)
|
|
- `SYNTHETIC_MARKET_MODE` (`realistic` | `active` | `firehose`, default `realistic`)
|
|
- `SYNTHETIC_OPTIONS_MODE` (optional per-service override; falls back to `SYNTHETIC_MARKET_MODE`)
|
|
- `SYNTHETIC_EQUITIES_MODE` (optional per-service override; falls back to `SYNTHETIC_MARKET_MODE`)
|
|
|
|
### Synthetic mode profiles
|
|
|
|
- `realistic` is the default local mode. Options produce materially more ordinary prints, fewer repeated bursts, and fewer alert-driving sweeps/spikes. Equities produce smaller batches and less relentless off-exchange activity.
|
|
- `active` is a busier demo mode that still leaves meaningful visible history in the UI.
|
|
- `firehose` is the stress profile for backpressure, hot-window eviction, and Databento-readiness validation.
|
|
|
|
### Options adapter settings
|
|
|
|
- Alpaca: `ALPACA_KEY_ID`, `ALPACA_SECRET_KEY`, `ALPACA_REST_URL`, `ALPACA_WS_BASE_URL`, `ALPACA_FEED`, `ALPACA_UNDERLYINGS`, `ALPACA_STRIKES_PER_SIDE`, `ALPACA_MAX_DTE_DAYS`, `ALPACA_MONEYNESS_PCT`, `ALPACA_MONEYNESS_FALLBACK_PCT`, `ALPACA_MAX_QUOTES`
|
|
- Databento: `DATABENTO_API_KEY`, `DATABENTO_DATASET`, `DATABENTO_SCHEMA`, `DATABENTO_NBBO_SCHEMA`, `DATABENTO_START`, `DATABENTO_END`, `DATABENTO_SYMBOLS`, `DATABENTO_STYPE_IN`, `DATABENTO_STYPE_OUT`, `DATABENTO_LIMIT`, `DATABENTO_PRICE_SCALE`, `DATABENTO_PYTHON_BIN`
|
|
- IBKR: `IBKR_HOST`, `IBKR_PORT`, `IBKR_CLIENT_ID`, `IBKR_SYMBOL`, `IBKR_EXPIRY`, `IBKR_STRIKE`, `IBKR_RIGHT`, `IBKR_EXCHANGE`, `IBKR_CURRENCY`, `IBKR_PYTHON_BIN`
|
|
|
|
### Equities adapter settings
|
|
|
|
- `ALPACA_EQUITIES_FEED` (`iex` or `sip`)
|
|
|
|
### Compute / classifiers / inference
|
|
|
|
- Delivery and windowing: `COMPUTE_DELIVER_POLICY`, `COMPUTE_CONSUMER_RESET`, `NBBO_MAX_AGE_MS`, `ROLLING_WINDOW_SIZE`, `ROLLING_TTL_SEC`
|
|
- Classifiers: `CLASSIFIER_SWEEP_MIN_PREMIUM`, `CLASSIFIER_SWEEP_MIN_COUNT`, `CLASSIFIER_SWEEP_MIN_PREMIUM_Z`, `CLASSIFIER_SPIKE_MIN_PREMIUM`, `CLASSIFIER_SPIKE_MIN_SIZE`, `CLASSIFIER_SPIKE_MIN_PREMIUM_Z`, `CLASSIFIER_SPIKE_MIN_SIZE_Z`, `CLASSIFIER_Z_MIN_SAMPLES`, `CLASSIFIER_MIN_NBBO_COVERAGE`, `CLASSIFIER_MIN_AGGRESSOR_RATIO`, `CLASSIFIER_0DTE_MAX_ATM_PCT`, `CLASSIFIER_0DTE_MIN_PREMIUM`, `CLASSIFIER_0DTE_MIN_SIZE`
|
|
- Dark inference: `EQUITY_QUOTE_MAX_AGE_MS`, `DARK_INFER_WINDOW_MS`, `DARK_INFER_COOLDOWN_MS`, `DARK_INFER_MIN_BLOCK_SIZE`, `DARK_INFER_MIN_ACCUM_SIZE`, `DARK_INFER_MIN_ACCUM_COUNT`, `DARK_INFER_MIN_PRINT_SIZE`, `DARK_INFER_MAX_EVIDENCE`, `DARK_INFER_MAX_SPREAD_PCT`
|
|
|
|
### Options signal filtering
|
|
|
|
- `OPTIONS_SIGNAL_MODE` (`smart-money` | `balanced` | `all`, default `smart-money`)
|
|
- `OPTIONS_SIGNAL_MIN_NOTIONAL` (default `10000`)
|
|
- `OPTIONS_SIGNAL_ETF_MIN_NOTIONAL` (default `50000`)
|
|
- `OPTIONS_SIGNAL_BID_SIDE_MIN_NOTIONAL` (default `25000`)
|
|
- `OPTIONS_SIGNAL_MID_MIN_NOTIONAL` (default `20000`)
|
|
- `OPTIONS_SIGNAL_NBBO_MAX_AGE_MS` (default `1500`)
|
|
- `OPTIONS_SIGNAL_ETF_UNDERLYINGS` (default `SPY,QQQ,IWM,DIA,TLT,GLD,SLV,XLF,XLE,XLV,XLI,XLP,XLU,XLY,SMH,ARKK`)
|
|
|
|
Default `smart-money` behavior:
|
|
|
|
- reject sub-`10k` options prints,
|
|
- reject ETF prints below `50k`,
|
|
- reject `B` / `BB` prints below `25k`,
|
|
- reject non-`SWEEP` / non-`ISO` `MID` prints below `20k`,
|
|
- require `50k` when NBBO is missing or stale,
|
|
- auto-keep `100k+`,
|
|
- keep ask-side `A` / `AA` prints at `10k+`,
|
|
- keep `SWEEP` / `ISO` prints at `25k+`,
|
|
- keep `500+` contract prints at `10k+`.
|
|
|
|
`balanced` uses the same shape with lower thresholds. `all` marks every option print as signal-passing.
|
|
|
|
### Candles
|
|
|
|
- `CANDLE_INTERVALS_MS`, `CANDLE_MAX_LATE_MS`, `CANDLE_CACHE_LIMIT`, `CANDLE_DELIVER_POLICY`, `CANDLE_CONSUMER_RESET`
|
|
|
|
### API
|
|
|
|
- `API_PORT`, `REST_DEFAULT_LIMIT`, `API_DELIVER_POLICY`, `API_CONSUMER_RESET`
|
|
- `LIVE_LIMIT_OPTIONS`, `LIVE_LIMIT_NBBO`, `LIVE_LIMIT_EQUITIES`, `LIVE_LIMIT_EQUITY_JOINS`, `LIVE_LIMIT_FLOW`, `LIVE_LIMIT_CLASSIFIER_HITS`, `LIVE_LIMIT_ALERTS`, `LIVE_LIMIT_INFERRED_DARK` (bounded live generic cache depths; defaults `10000`, max `100000`)
|
|
|
|
### Web live retention
|
|
|
|
- `NEXT_PUBLIC_LIVE_HOT_WINDOW` (frontend hot live window cap for non-options feeds; default `2000`)
|
|
- `NEXT_PUBLIC_LIVE_HOT_WINDOW_OPTIONS` (frontend hot live window cap for options prints; default `25000`)
|
|
- `NEXT_PUBLIC_PINNED_EVIDENCE_TTL_MS` (pinned evidence TTL; default `1200000`)
|
|
- `NEXT_PUBLIC_PINNED_EVIDENCE_MAX_ITEMS` (pinned evidence cache guardrail; default `4000`)
|
|
- `NEXT_PUBLIC_FLOW_FILTER_PRESET` (`smart-money` | `balanced` | `all`, default `smart-money`)
|
|
|
|
### Replay service
|
|
|
|
- `REPLAY_ENABLED`, `REPLAY_STREAMS`, `REPLAY_START_TS`, `REPLAY_END_TS`, `REPLAY_SPEED`, `REPLAY_BATCH_SIZE`, `REPLAY_LOG_EVERY`
|
|
|
|
### Testing-mode throttling
|
|
|
|
- `TESTING_MODE`
|
|
- `TESTING_THROTTLE_MS`
|
|
|
|
## Quick Notes
|
|
|
|
- Python dependencies are required only for IBKR/Databento sidecars (`services/ingest-options/py/requirements.txt`).
|
|
- Candle construction is server-side; the client consumes prebuilt OHLC events.
|
|
- Option prints now persist as enriched raw rows and can be queried as either:
|
|
- `view=signal` — default live/UI path and compute input.
|
|
- `view=raw` — audit/debug path that preserves every stored print.
|
|
- The default Tape page options/packets posture is now stock-only, hides `B` / `BB`, keeps calls and puts visible, and applies in-memory min-notional controls immediately.
|
|
- Live retention uses a two-tier model:
|
|
- API/Redis maintain a bounded hot cache per live generic channel.
|
|
- UI keeps a bounded hot window for rendering performance around the signal view rather than raw noise.
|
|
- Options prints can use a deeper dedicated cap via `NEXT_PUBLIC_LIVE_HOT_WINDOW_OPTIONS` without raising every other feed.
|
|
- Alert/drawer evidence is pinned and hydrated by id/trace so details remain inspectable after hot-window eviction.
|
|
- Firehose-readiness strategy:
|
|
- preserve raw ingest for storage/replay,
|
|
- feed compute and default live UI from the filtered signal path,
|
|
- add filterable live subscription contracts now so selective delivery can move server-side without reshaping the protocol later.
|
|
- This repository is for personal, non-redistributed usage.
|
|
|
|
## Useful Examples
|
|
|
|
Realistic local demo:
|
|
|
|
```bash
|
|
SYNTHETIC_MARKET_MODE=realistic \
|
|
OPTIONS_SIGNAL_MODE=smart-money \
|
|
bun run dev
|
|
```
|
|
|
|
Active demo:
|
|
|
|
```bash
|
|
SYNTHETIC_MARKET_MODE=active bun run dev
|
|
```
|
|
|
|
Firehose stress test:
|
|
|
|
```bash
|
|
SYNTHETIC_MARKET_MODE=firehose \
|
|
NEXT_PUBLIC_LIVE_HOT_WINDOW=2000 \
|
|
bun run dev
|
|
```
|
|
|
|
Show raw options flow for debugging:
|
|
|
|
```text
|
|
/prints/options?view=raw&security=all
|
|
/history/options?view=raw&security=all&before_ts=<ts>&before_seq=<seq>
|
|
/replay/options?view=raw&security=all&after_ts=<ts>&after_seq=<seq>
|
|
```
|