update readme for current project state

This commit is contained in:
dirtydishes 2026-05-19 07:40:18 -04:00
parent b6fa2f0d17
commit 82fd29f1a4
3 changed files with 449 additions and 200 deletions

389
README.md
View file

@ -6,11 +6,12 @@
> **Pre-alpha warning** This project is in an early pre-alpha state. It will not perform consistently or as expected, and APIs, behavior, and data contracts may change without notice.
This repository contains a Bun + TypeScript monorepo for a personal-use, event-sourced market microstructure research platform focused on:
Islandflow is a Bun + TypeScript monorepo for a personal-use, event-sourced market microstructure research platform focused on:
- options prints + NBBO,
- off-exchange equity prints,
- explainable rule-based flow classification,
- market news context,
- explainable smart-money flow classification,
- deterministic replay,
- evidence-linked UI inspection.
@ -19,124 +20,175 @@ This repository contains a Bun + TypeScript monorepo for a personal-use, event-s
Implemented now:
- Bun workspaces with shared packages for schemas, bus, config, observability, and ClickHouse access.
- Infra orchestration via Docker Compose (NATS JetStream, ClickHouse, Redis).
- Options ingest service with adapters:
- synthetic stream,
- Alpaca options (dev-focused, bounded contracts),
- IBKR bridge (Python sidecar),
- Databento historical replay adapter (Python sidecar).
- Equities ingest service with adapters:
- synthetic stream,
- Alpaca equities trades/quotes.
- Compute service:
- deterministic option print clustering into `FlowPacket`s,
- NBBO join quality features and aggressor-mix metrics,
- rolling baselines in Redis,
- structure summarization and structure packet emission,
- rule-based classifiers + confidence-scored alert events,
- dark-style inferred events from equity prints/quotes,
- equity print-to-quote join events.
- Candles service:
- server-side equity candle aggregation,
- ClickHouse persistence,
- optional Redis hot cache,
- NATS publication.
- Replay service:
- deterministic republishing from ClickHouse to NATS,
- multi-stream merge with stable tie-break ordering,
- speed/start/end controls.
- API service:
- REST endpoints for recent + cursor pagination,
- REST range endpoints for chart windows,
- REST replay-oriented endpoints,
- WebSocket channels for options, NBBO, equities, quotes, joins, flow, classifier hits, alerts, inferred dark, and candles.
- Next.js web app:
- live tape/workspace views,
- replay controls and status,
- signals and chart-focused routes,
- evidence-centric terminal UI.
- Refdata + EOD enricher service entrypoints are present but currently scaffolds (lifecycle/logging only).
- Infra orchestration via Docker Compose for local NATS JetStream, ClickHouse, and Redis.
- Options ingest service with synthetic, Alpaca options, IBKR bridge, and Databento historical replay adapters.
- Equities ingest service with synthetic and Alpaca equities trades/quotes adapters.
- News ingest service for Alpaca news backfill and websocket publication.
- Compute service for deterministic parent-event reconstruction, flow packets, NBBO quality features, rolling baselines, smart-money profile scoring, compatibility classifier hits, alerts, inferred dark-style events, and equity print-to-quote joins.
- Candles service for server-side equity candle aggregation, ClickHouse persistence, optional Redis hot cache, and NATS publication.
- Replay service for deterministic ClickHouse-to-NATS republishing with multi-stream merge, stable tie-break ordering, speed, start, and end controls.
- API service with REST endpoints, cursor pagination, replay/history endpoints, live hot-cache hydration, and WebSocket channels for options, NBBO, equities, quotes, joins, flow, classifier hits, alerts, smart-money events, inferred dark, candles, and news.
- Next.js web app upgraded to Next.js `16.2.6`, React `19.2.0`, and React DOM `19.2.0`.
- Evidence-centric terminal UI, live/replay controls, chart-focused routes, news view, profile-aware smart-money display, and alert-context hydration.
- Thin Electron desktop shell in `apps/desktop` that can wrap the hosted app or local web UI.
- Refdata + EOD enricher service entrypoints are present, with refdata able to validate or refresh the event-calendar cache.
Planned / not yet complete:
- production-grade licensed feed integrations and entitlement workflow,
- richer refdata/corp-action enrichment,
- secure deployment/auth hardening,
- deeper structure + calibration workflows from `PLAN.md`.
- native deployment unit templates and rollback helpers,
- signed/notarized desktop distribution and richer desktop-native features,
- deeper calibration workflows from `PLAN.md` and `SMART_MONEY_REBUILD_PLAN.md`.
## Core Principles
- **Explainability first** — inferred outputs are evidence-backed and human-readable.
- **Event sourcing** — raw and derived events persist to support replay.
- **Determinism** — replay behavior tracks live pipeline logic.
- **Microstructure awareness** — bounded joins, confidence scoring, and explicit uncertainty.
- **Bun-first tooling** — runtime/package/scripts all use Bun.
- **Explainability first**: inferred outputs are evidence-backed and human-readable.
- **Event sourcing**: raw and derived events persist to support replay.
- **Determinism**: replay behavior tracks live pipeline logic.
- **Microstructure awareness**: bounded joins, confidence scoring, and explicit uncertainty.
- **Taxonomy over folklore**: "smart money" is modeled as participant-style hypotheses, not a single binary label.
- **Bun-first tooling**: runtime, package management, scripts, and tests use Bun.
## Smart-Money Classification Taxonomy
Islandflow now emits first-class `SmartMoneyEvent` records instead of treating old classifier hits as the final semantic object. `FlowPacket` remains the clustering bridge, while smart-money events carry typed features, profile scores, confidence bands, directions, reason codes, abstention state, and suppression reasons.
Public profile IDs:
| Profile ID | Meaning | Common evidence |
| --- | --- | --- |
| `institutional_directional` | Large directional parent flow with stronger institutional-style conviction. | premium, size, sweep/burst behavior, aggressor imbalance, quote quality, not short-dated retail-chase context |
| `retail_whale` | Large retail-style speculative bursts, often short-dated or attention-driven. | short-dated OTM concentration, burst prints, IV shock, lower premium than institutional blocks |
| `event_driven` | Flow aligned to known upcoming events. | event-calendar proximity, expiry after event, pre-event concentration, spread/IV pressure |
| `vol_seller` | Premium-selling or short-volatility structure evidence. | sell-side premium, straddles/strangles, neutral direction |
| `arbitrage` | Multi-leg or symmetric structures with low directional exposure. | matched leg symmetry, same-size legs, near-flat directional bias |
| `hedge_reactive` | Hedge or dealer-reaction style flow around short-dated ATM/gamma context. | 0-2 DTE, near-ATM contracts, underlying move linkage, size |
Compatibility surfaces remain in place:
- `ClassifierHitEvent` is derived from `SmartMoneyEvent.primary_profile_id`.
- `AlertEvent` may include `primary_profile_id` and `profile_scores`.
- Legacy classifier and alert endpoints still work.
Primary smart-money access paths:
```text
/flow/smart-money
/history/smart-money
/replay/smart-money
/ws/smart-money
```
The classifier intentionally abstains when evidence is weak or quote context is stale/missing. Suppression guards cover stale quotes, complex/special prints, retail-frenzy directional confusion, hedge-reactive short-dated ATM contexts, and arbitrage symmetry.
## Monorepo Layout
- `apps/web` — Next.js UI shell/routes.
- `apps/desktop` — Electron desktop shell that loads the hosted Islandflow app.
- `apps/desktop` — Electron desktop shell that loads the hosted or local Islandflow app.
- `services/ingest-options` — options print/NBBO ingest adapters.
- `services/ingest-equities` — equity print/quote ingest adapters.
- `services/compute` — clustering, structures, classifiers, alerts, inferred dark.
- `services/ingest-news` — Alpaca news backfill and websocket ingest.
- `services/compute` — parent-event reconstruction, flow packets, smart-money scoring, alerts, inferred dark.
- `services/candles` — server-side candle aggregation + cache.
- `services/replay` — ClickHouse → NATS replay streamer.
- `services/replay` — ClickHouse to NATS replay streamer.
- `services/api` — REST + WebSocket gateway.
- `services/refdata` — scaffold service.
- `services/refdata`event-calendar validation/provider refresh scaffolding.
- `services/eod-enricher` — scaffold service.
- `packages/types` — shared event schemas/types.
- `packages/storage` — ClickHouse tables/queries.
- `packages/bus` — NATS/JetStream helpers.
- `packages/config` — env parsing.
- `packages/observability` — logger + metrics facade.
- `deployment/docker` — supported VPS Docker Compose runtime.
- `deployment/native` — experimental host-native Bun + systemd deployment notes.
## Build and Run
Install dependencies:
- `bun install`
```bash
bun install
```
Start infrastructure only:
- `docker compose up -d`
```bash
bun run dev:infra
```
Create env file:
- copy `.env.example` to `.env` and set provider credentials as needed.
```bash
cp .env.example .env
```
Start infra + all services + web:
- `bun run dev`
```bash
bun run dev
```
Start services only (assumes infra is already running):
Start services only, assuming infra is already running:
- `bun run dev:services`
```bash
bun run dev:services
```
Start web only:
- `bun run dev:web`
```bash
bun run dev:web
```
Recommended fast iteration loop:
- `bun run dev:infra` for Docker-backed infra only
- `bun run dev:services` for native Bun backend services
- `bun run dev:web` for the local Next.js UI
```bash
bun run dev:infra
bun run dev:services
bun run dev:web
```
This keeps Docker in the local workflow where it helps most (NATS, ClickHouse, Redis) without forcing the app services themselves into slower container rebuild/restart loops.
This keeps Docker in the local workflow where it helps most, for NATS, ClickHouse, and Redis, while keeping the app services in native Bun/Next.js loops.
## Deployment Workflow
- `./deploy main` keeps the current VPS Docker rollout path as the default and recommended path.
- Do not run the repo-root `docker-compose.yml` on the VPS. That file is for local infra only and can create duplicate exposed NATS, ClickHouse, and Redis containers on the server.
- `./deploy main --runtime native` targets an experimental host-native Bun + systemd deployment.
- `./deploy current-branch` and `./deploy current-branch --runtime native` keep branch deploys available during the transition, but Docker remains the supported path for the current VPS.
- Partial deploys are supported with `--web-only`, `--api-only`, `--services-only`, and `--no-build`.
- Docker runtime details live in `deployment/docker/README.md`.
- Native runtime expectations and prerequisites live in `deployment/native/README.md`.
Docker remains the supported and recommended path for the current VPS.
```bash
./deploy main
./deploy main --runtime docker
./deploy current-branch
./deploy current-branch --runtime docker
```
Important deployment notes:
- Run the deploy helper from the local repo checkout, not from the VPS shell.
- Do not run the repo-root `docker-compose.yml` on the VPS. It is local infra only and can create duplicate exposed NATS, ClickHouse, and Redis containers on the server.
- The Docker stack lives in `deployment/docker` and is separate from local development infra.
- Partial deploys are supported with `--web-only`, `--api-only`, `--services-only`, `--fast`, `--no-build`, and `--force-recreate`.
- `--fast` defaults to a services-only Docker rollout when no explicit scope is provided and trims public API route-suite verification while preserving remote service health checks.
- `./deploy current-branch` requires a clean local working tree and pushes the branch before moving the server checkout.
- The helper has Forgejo-aware remote resolution for deployments and branch pushes.
- Native deployment is opt-in and experimental:
```bash
./deploy main --runtime native
./deploy current-branch --runtime native
```
Native deployment expects Bun, systemd units, host-reachable infra, and deliberate reverse-proxy changes. The open follow-up is to add native unit templates and rollback helpers.
Read more:
- `deployment/docker/README.md`
- `deployment/native/README.md`
## Desktop Shell
Islandflow also includes a thin Electron desktop shell in `apps/desktop`.
Islandflow includes a thin Electron desktop shell in `apps/desktop`.
What it is:
@ -144,37 +196,35 @@ What it is:
- a native app window plus packaging/distribution shell,
- a way to run the existing web UI inside Electron without local backend services.
What it is not:
What it is not yet:
- a bundled backend runtime,
- a packaged local Next.js frontend in v1,
- a desktop feature layer with notifications, preferences, or auto-updates yet.
- a packaged local Next.js frontend,
- a desktop feature layer with notifications, preferences, auto-updates, signing, or notarization.
Run the desktop shell against a local web UI:
- `bun run dev:desktop`
This starts the local Next.js app, defaults `NEXT_PUBLIC_API_URL` to `https://flow.deltaisland.io` unless you already set it, waits for port `3000`, and then launches Electron against `http://127.0.0.1:3000`.
```bash
bun run dev:desktop
```
Run the desktop shell directly against the hosted app:
- `bun run dev:desktop:remote`
```bash
bun run dev:desktop:remote
```
Package the desktop shell:
- `bun run package:desktop`
- `bun run make:desktop`
```bash
bun run package:desktop
bun run make:desktop
```
Desktop-specific environment:
- `ISLANDFLOW_DESKTOP_START_URL` is only used by the Electron shell and is restricted to trusted Islandflow app origins.
- `NEXT_PUBLIC_API_URL` remains the web app's API/WebSocket origin control and should usually point at `https://flow.deltaisland.io` when developing the local UI inside Electron.
Current desktop limitations:
- v1 builds are unsigned internal macOS artifacts only,
- Forge currently makes a simple zip distributable for the current host architecture,
- signing, notarization, auto-updates, remembered window state, and richer native integrations are intentionally deferred.
- `NEXT_PUBLIC_API_URL` remains the web app API/WebSocket origin control and usually points at `https://flow.deltaisland.io` when developing local UI inside Electron.
## Environment Configuration
@ -196,32 +246,27 @@ All runtime configuration comes from `.env`.
| `OPTIONS_INGEST_ADAPTER` | `synthetic` | Options ingest source: `synthetic`, `alpaca`, `ibkr`, or `databento`. |
| `EQUITIES_INGEST_ADAPTER` | `synthetic` | Equities ingest source: `synthetic` or `alpaca`. |
| `EMIT_INTERVAL_MS` | `1000` | Emit cadence for synthetic ingest adapters. |
| `SYNTHETIC_MARKET_MODE` | `realistic` | Shared synthetic profile (`realistic`, `active`, `firehose`) used when per-service override is unset. |
| `SYNTHETIC_OPTIONS_MODE` | empty | Options-only synthetic profile override; falls back to `SYNTHETIC_MARKET_MODE`. |
| `SYNTHETIC_EQUITIES_MODE` | empty | Equities-only synthetic profile override; falls back to `SYNTHETIC_MARKET_MODE`. |
| `SYNTHETIC_MARKET_MODE` | `realistic` | Shared synthetic profile: `realistic`, `active`, or `firehose`. |
| `SYNTHETIC_OPTIONS_MODE` | empty | Options-only synthetic profile override. |
| `SYNTHETIC_EQUITIES_MODE` | empty | Equities-only synthetic profile override. |
Synthetic profile intent:
- `realistic`: default local mode with lower synthetic burstiness/noise.
- `active`: busier demo flow while still readable.
- `firehose`: stress mode for throughput/backpressure/hot-window behavior.
### Options ingest adapter configuration
### Alpaca and news configuration
| Variable | Default | What it controls |
| --- | --- | --- |
| `ALPACA_API_KEY` | empty | Single-token Alpaca API auth for options/equities adapters. Use this when your account provides one API key value. |
| `ALPACA_REST_URL` | `https://data.alpaca.markets` | Alpaca REST base URL for contract discovery/reference calls. |
| `ALPACA_WS_BASE_URL` | `wss://stream.data.alpaca.markets/v1beta1` (options), `wss://stream.data.alpaca.markets` (equities) | Alpaca websocket base URL. |
| `ALPACA_FEED` | `indicative` | Options feed tier for Alpaca options (`indicative` or `opra`). |
| `ALPACA_API_KEY` | empty | Single-token Alpaca API auth for options, equities, and news adapters. |
| `ALPACA_REST_URL` | `https://data.alpaca.markets` | Alpaca REST base URL. |
| `ALPACA_WS_BASE_URL` | `wss://stream.data.alpaca.markets/v1beta1` for options, `wss://stream.data.alpaca.markets` for equities/news | Alpaca websocket base URL. |
| `ALPACA_FEED` | `indicative` | Options feed tier: `indicative` or `opra`. |
| `ALPACA_UNDERLYINGS` | `SPY,NVDA,AAPL` | Comma-separated symbols targeted by Alpaca ingest. |
| `ALPACA_STRIKES_PER_SIDE` | `8` | Contracts selected per side of spot for Alpaca options chain sampling. |
| `ALPACA_MAX_DTE_DAYS` | `30` | Max days-to-expiry included for Alpaca options contract selection. |
| `ALPACA_MONEYNESS_PCT` | `0.06` | Primary moneyness filter for Alpaca options contract selection. |
| `ALPACA_MONEYNESS_FALLBACK_PCT` | `0.1` | Wider fallback moneyness filter if candidate set is too sparse. |
| `ALPACA_MAX_QUOTES` | `200` | Upper bound on selected Alpaca options contracts/quotes per cycle. |
| `ALPACA_EQUITIES_FEED` | `iex` | Alpaca equities feed (`iex` free tier, `sip` paid consolidated feed). |
For Alpaca adapters, configure `ALPACA_API_KEY`.
| `ALPACA_EQUITIES_FEED` | `iex` | Alpaca equities feed: `iex` or `sip`. |
| `ALPACA_NEWS_BACKFILL_LIMIT` | `100` | Alpaca news stories fetched on startup, capped at 200. |
| `ALPACA_NEWS_WEBSOCKET_PATH` | `/v1beta1/news` | Alpaca news websocket path. |
### Databento replay adapter configuration
@ -236,7 +281,7 @@ For Alpaca adapters, configure `ALPACA_API_KEY`.
| `DATABENTO_SYMBOLS` | `ALL` | Symbol selection forwarded to Databento sidecar query. |
| `DATABENTO_STYPE_IN` | `raw_symbol` | Databento input symbology type. |
| `DATABENTO_STYPE_OUT` | `raw_symbol` | Databento output symbology type. |
| `DATABENTO_LIMIT` | `0` | Max Databento records (`0` means no explicit limit). |
| `DATABENTO_LIMIT` | `0` | Max Databento records, where `0` means no explicit limit. |
| `DATABENTO_PRICE_SCALE` | `1` | Multiplier applied to decoded prices from sidecar output. |
| `DATABENTO_PYTHON_BIN` | `python3` | Python executable used to run Databento sidecar script. |
@ -248,9 +293,9 @@ For Alpaca adapters, configure `ALPACA_API_KEY`.
| `IBKR_PORT` | `7497` | TWS/Gateway port for IBKR bridge. |
| `IBKR_CLIENT_ID` | `0` | IBKR client id used by the bridge connection. |
| `IBKR_SYMBOL` | `SPY` | Underlying symbol requested from IBKR. |
| `IBKR_EXPIRY` | `20250117` | Option expiry (YYYYMMDD) requested from IBKR. |
| `IBKR_EXPIRY` | `20250117` | Option expiry requested from IBKR. |
| `IBKR_STRIKE` | `450` | Strike requested from IBKR. |
| `IBKR_RIGHT` | `C` | Option side (`C` or `P`). |
| `IBKR_RIGHT` | `C` | Option side: `C` or `P`. |
| `IBKR_EXCHANGE` | `SMART` | IBKR exchange routing code. |
| `IBKR_CURRENCY` | `USD` | Contract currency. |
| `IBKR_PYTHON_BIN` | `python3` | Python executable used for IBKR sidecar. |
@ -259,133 +304,77 @@ For Alpaca adapters, configure `ALPACA_API_KEY`.
| Variable | Default | What it controls |
| --- | --- | --- |
| `OPTIONS_SIGNAL_MODE` | `smart-money` | Signal pass policy (`smart-money`, `balanced`, `all`) for options prints. |
| `OPTIONS_SIGNAL_MODE` | `smart-money` | Signal pass policy: `smart-money`, `balanced`, or `all`. |
| `OPTIONS_SIGNAL_MIN_NOTIONAL` | `10000` | Base minimum notional for most signal candidates. |
| `OPTIONS_SIGNAL_ETF_MIN_NOTIONAL` | `50000` | ETF-specific minimum notional for signal inclusion. |
| `OPTIONS_SIGNAL_BID_SIDE_MIN_NOTIONAL` | `25000` | Minimum notional for bid-side (`B`/`BB`) or sweep/ISO thresholds. |
| `OPTIONS_SIGNAL_BID_SIDE_MIN_NOTIONAL` | `25000` | Minimum notional for bid-side or sweep/ISO thresholds. |
| `OPTIONS_SIGNAL_MID_MIN_NOTIONAL` | `20000` | Minimum notional for non-sweep/non-ISO `MID` prints. |
| `OPTIONS_SIGNAL_NBBO_MAX_AGE_MS` | `1500` | NBBO freshness threshold used during signal classification. |
| `OPTIONS_SIGNAL_ETF_UNDERLYINGS` | `SPY,QQQ,IWM,DIA,TLT,GLD,SLV,XLF,XLE,XLV,XLI,XLP,XLU,XLY,SMH,ARKK` | Comma-separated underlyings treated as ETFs by signal filters. |
| `OPTIONS_SIGNAL_ETF_UNDERLYINGS` | `SPY,QQQ,IWM,DIA,TLT,GLD,SLV,XLF,XLE,XLV,XLI,XLP,XLU,XLY,SMH,ARKK` | ETF underlyings treated specially by signal filters. |
Default `smart-money` policy rejects lower-information prints and keeps high-confidence/high-notional/sweep-style flow; `balanced` lowers thresholds; `all` bypasses filtering.
Default `smart-money` policy rejects lower-information prints and keeps higher-confidence, higher-notional, sweep-style flow. `balanced` lowers thresholds. `all` bypasses filtering.
### Compute/classifier/dark-inference configuration
### Compute, classifier, and dark-inference configuration
| Variable | Default | What it controls |
| --- | --- | --- |
| `CLUSTER_WINDOW_MS` | `500` | Time window used to cluster nearby option prints into a packet candidate. |
| `COMPUTE_DELIVER_POLICY` | `new` | Consumer start policy for compute stream subscriptions (`new`, `all`, `last`, `last_per_subject`). |
| `COMPUTE_CONSUMER_RESET` | `false` | If true, resets durable consumer position for compute on startup. |
| `CLUSTER_WINDOW_MS` | `500` | Time window used to cluster nearby option prints into packet candidates. |
| `COMPUTE_DELIVER_POLICY` | `new` | Consumer start policy for compute subscriptions. |
| `COMPUTE_CONSUMER_RESET` | `false` | Resets durable consumer position for compute on startup when true. |
| `NBBO_MAX_AGE_MS` | `1000` | Max NBBO age accepted when enriching option prints in compute. |
| `ROLLING_WINDOW_SIZE` | `50` | Number of observations retained per rolling metric key. |
| `ROLLING_TTL_SEC` | `86400` | Redis TTL for rolling metric keys. |
| `EQUITY_QUOTE_MAX_AGE_MS` | `1000` | Max quote staleness when joining equity prints for inference. |
| `DARK_INFER_WINDOW_MS` | `60000` | Sliding window length for dark-style inference accumulation. |
| `DARK_INFER_COOLDOWN_MS` | `30000` | Cooldown before emitting repeated dark inferences for same symbol/pattern. |
| `DARK_INFER_MIN_BLOCK_SIZE` | `2000` | Minimum single-print size for block-style dark inference evidence. |
| `DARK_INFER_MIN_ACCUM_SIZE` | `3000` | Minimum aggregate size for accumulation-style dark inference evidence. |
| `DARK_INFER_MIN_ACCUM_COUNT` | `4` | Minimum print count for accumulation-style dark inference. |
| `DARK_INFER_MIN_PRINT_SIZE` | `200` | Minimum print size considered as dark inference evidence. |
| `DARK_INFER_MAX_EVIDENCE` | `20` | Max evidence items attached to one inferred dark event. |
| `DARK_INFER_MAX_SPREAD_PCT` | `0.005` | Maximum spread percentage allowed for dark inference confidence. |
| `CLASSIFIER_SWEEP_MIN_PREMIUM` | `40000` | Minimum premium to trigger sweep classifier logic. |
| `CLASSIFIER_SWEEP_MIN_COUNT` | `3` | Minimum child prints in cluster for sweep classifier hit. |
| `CLASSIFIER_SWEEP_MIN_PREMIUM_Z` | `2` | Min premium z-score for sweep classifier confirmation. |
| `CLASSIFIER_SPIKE_MIN_PREMIUM` | `20000` | Minimum premium for spike classifier logic. |
| `CLASSIFIER_SPIKE_MIN_SIZE` | `400` | Minimum total size for spike classifier logic. |
| `CLASSIFIER_SPIKE_MIN_PREMIUM_Z` | `2.5` | Min premium z-score for spike classifier confirmation. |
| `CLASSIFIER_SPIKE_MIN_SIZE_Z` | `2` | Min size z-score for spike classifier confirmation. |
| `CLASSIFIER_Z_MIN_SAMPLES` | `12` | Minimum rolling sample count before z-score gating applies. |
| `CLASSIFIER_MIN_NBBO_COVERAGE` | `0.5` | Required fraction of prints in cluster with valid NBBO context. |
| `CLASSIFIER_MIN_AGGRESSOR_RATIO` | `0.55` | Minimum aggressor-side ratio for classifier confidence. |
| `CLASSIFIER_0DTE_MAX_ATM_PCT` | `0.01` | Max distance-from-ATM to qualify as near-ATM 0DTE event. |
| `CLASSIFIER_0DTE_MIN_PREMIUM` | `20000` | Minimum premium for 0DTE classifier events. |
| `CLASSIFIER_0DTE_MIN_SIZE` | `400` | Minimum size for 0DTE classifier events. |
| `SMART_MONEY_EVENT_CALENDAR_PATH` | empty | Optional JSON event-calendar file used by compute to enrich event-driven smart-money profile features. |
| `REFDATA_EVENT_CALENDAR_PATH` | empty | Optional JSON event-calendar file for refdata service startup validation; falls back to `SMART_MONEY_EVENT_CALENDAR_PATH` when unset. |
| `REFDATA_EVENT_CALENDAR_PROVIDER` | empty | Set to `alpha_vantage` to have refdata refresh the calendar cache from Alpha Vantage. |
| `ALPHA_VANTAGE_API_KEY` | empty | Alpha Vantage key used when `REFDATA_EVENT_CALENDAR_PROVIDER=alpha_vantage`. |
| `ALPHA_VANTAGE_EARNINGS_HORIZON` | `3month` | Alpha Vantage earnings horizon: `3month`, `6month`, or `12month`. |
| `ALPHA_VANTAGE_EARNINGS_SYMBOL` | empty | Optional single-symbol Alpha Vantage earnings query; empty fetches the full scheduled earnings list. |
| `REFDATA_EVENT_CALENDAR_REFRESH_MS` | `86400000` | Refdata refresh cadence for provider-backed event-calendar cache writes. |
| `DARK_INFER_COOLDOWN_MS` | `30000` | Cooldown before repeated dark inferences for same symbol/pattern. |
| `SMART_MONEY_EVENT_CALENDAR_PATH` | empty | Optional JSON event-calendar file used by compute. |
| `REFDATA_EVENT_CALENDAR_PATH` | empty | Optional JSON event-calendar path for refdata; falls back to `SMART_MONEY_EVENT_CALENDAR_PATH`. |
| `REFDATA_EVENT_CALENDAR_PROVIDER` | empty | Set to `alpha_vantage` to refresh event-calendar cache from Alpha Vantage. |
| `ALPHA_VANTAGE_API_KEY` | empty | Alpha Vantage key for provider-backed event-calendar refresh. |
Event-calendar rows may use `symbol`, `underlying`, or `underlying_id`; `event_date`, `event_time`, or `event_ts`; and `announced_ts`, `available_ts`, `as_of_ts`, or `created_ts`. Compute only uses events already available at the packet timestamp, so missing or unavailable rows leave event-alignment features as neutral `null` values.
### Candle service configuration
| Variable | Default | What it controls |
| --- | --- | --- |
| `CANDLE_INTERVALS_MS` | `60000,300000` | Comma-separated candle intervals generated from equity prints. |
| `CANDLE_MAX_LATE_MS` | `0` | Allowed lateness for out-of-order prints before candle rejection/roll policy applies. |
| `CANDLE_CACHE_LIMIT` | `2000` | Max cached candles per `(underlying, interval)` in Redis (`0` disables cache). |
| `CANDLE_DELIVER_POLICY` | `new` | Consumer start policy for candle service (`new`, `all`, `last`, `last_per_subject`). |
| `CANDLE_CONSUMER_RESET` | `false` | If true, resets candle durable consumer position on startup. |
### API + live cache configuration
### API, live cache, and web client
| Variable | Default | What it controls |
| --- | --- | --- |
| `API_PORT` | `4000` | API service listen port. |
| `REST_DEFAULT_LIMIT` | `200` | Default record count when a REST endpoint omits `limit`. |
| `API_DELIVER_POLICY` | `new` | JetStream consumer start policy used by API live subscribers (`new`, `all`, `last`, `last_per_subject`). |
| `API_CONSUMER_RESET` | `false` | If true, API resets/recreates its live durable consumers on startup. |
| `LIVE_LIMIT_OPTIONS` | `10000` | In-memory/Redis live cache depth for options channel (clamped `1..100000`). |
| `LIVE_LIMIT_NBBO` | `10000` | Live cache depth for options NBBO channel (clamped `1..100000`). |
| `LIVE_LIMIT_EQUITIES` | `10000` | Live cache depth for equities channel (clamped `1..100000`). |
| `LIVE_LIMIT_EQUITY_QUOTES` | `10000` | Live cache depth for equity quotes channel (clamped `1..100000`). |
| `LIVE_LIMIT_EQUITY_JOINS` | `10000` | Live cache depth for equity join channel (clamped `1..100000`). |
| `LIVE_LIMIT_FLOW` | `10000` | Live cache depth for flow packet channel (clamped `1..100000`). |
| `LIVE_LIMIT_CLASSIFIER_HITS` | `10000` | Live cache depth for classifier hits channel (clamped `1..100000`). |
| `LIVE_LIMIT_ALERTS` | `10000` | Live cache depth for alerts channel (clamped `1..100000`). |
| `LIVE_LIMIT_INFERRED_DARK` | `10000` | Live cache depth for inferred dark channel (clamped `1..100000`). |
### Web client configuration (`NEXT_PUBLIC_*`)
| Variable | Default | What it controls |
| --- | --- | --- |
| `NEXT_PUBLIC_API_URL` | auto-detected (`window.location.origin` in browser; `http://127.0.0.1:4000` fallback) | Explicit base URL for API/WS calls from the web app. |
| `NEXT_PUBLIC_LIVE_HOT_WINDOW` | `2000` | Max hot-window items retained for non-options live streams in UI state (`100..100000`). |
| `NEXT_PUBLIC_LIVE_HOT_WINDOW_OPTIONS` | `25000` | Dedicated max hot-window items retained for options prints (`100..100000`). |
| `NEXT_PUBLIC_NBBO_MAX_AGE_MS` | `1000` | Frontend NBBO staleness threshold used for UI status/placement logic. |
| `NEXT_PUBLIC_LIVE_EQUITIES_SILENT_WARNING_MS` | `25000` | Delay before warning when equities stream is quiet (`5000..300000`). |
| `NEXT_PUBLIC_PINNED_EVIDENCE_TTL_MS` | `1200000` | TTL for pinned evidence objects in UI (`60000..7200000`). |
| `NEXT_PUBLIC_PINNED_EVIDENCE_MAX_ITEMS` | `4000` | Maximum pinned evidence cache size in UI (`100..50000`). |
| `NEXT_PUBLIC_FLOW_FILTER_PRESET` | `smart-money` | Default flow filter preset applied on page load (`smart-money`, `balanced`, `all`). |
| `REST_DEFAULT_LIMIT` | `200` | Default REST record count. |
| `API_DELIVER_POLICY` | `new` | JetStream consumer start policy used by API live subscribers. |
| `API_CONSUMER_RESET` | `false` | Resets/recreates API live durable consumers on startup when true. |
| `LIVE_LIMIT_DEFAULT` | `1000` | Optional generic live cache depth default. |
| `LIVE_LIMIT_FLOW` | `500` | Live cache depth for flow packet events unless overridden. |
| `LIVE_LIMIT_SMART_MONEY` | `300` | Live cache depth for smart-money events unless overridden. |
| `LIVE_LIMIT_OPTIONS` | `1000` | Live cache depth for options channel unless overridden. |
| `LIVE_LIMIT_ALERTS` | `300` | Live cache depth for alerts channel unless overridden. |
| `LIVE_LIMIT_NEWS` | `100` | Live cache depth for news channel unless overridden. |
| `NEXT_PUBLIC_API_URL` | auto-detected in browser, `http://127.0.0.1:4000` fallback | Explicit base URL for API/WS calls from the web app. |
| `NEXT_PUBLIC_LIVE_HOT_WINDOW` | `600` | Max hot-window items retained for non-options live streams in UI state. |
| `NEXT_PUBLIC_LIVE_HOT_WINDOW_OPTIONS` | `1200` | Dedicated max hot-window items retained for options prints. |
| `NEXT_PUBLIC_NBBO_MAX_AGE_MS` | `1000` | Frontend NBBO staleness threshold. |
| `NEXT_PUBLIC_FLOW_FILTER_PRESET` | `smart-money` | Default flow filter preset: `smart-money`, `balanced`, or `all`. |
### Replay and testing controls
| Variable | Default | What it controls |
| --- | --- | --- |
| `REPLAY_ENABLED` | `false` | Dev-script toggle: starts replay service in `bun run dev` when truthy. |
| `REPLAY_STREAMS` | `options,nbbo,equities,equity-quotes` | Replay stream selection (`all` or comma list of supported aliases). |
| `REPLAY_START_TS` | `0` | Replay lower-bound timestamp; `0` means from earliest stored data. |
| `REPLAY_END_TS` | `0` | Replay upper-bound timestamp; `0` means no explicit end bound. |
| `REPLAY_SPEED` | `1` | Replay speed multiplier relative to original event timing. |
| `REPLAY_BATCH_SIZE` | `200` | Batch fetch size per replay stream pull. |
| `REPLAY_LOG_EVERY` | `1000` | Progress log interval (emitted event count). |
| `REPLAY_ENABLED` | `false` | Starts replay service in `bun run dev` when truthy. |
| `REPLAY_STREAMS` | `options,nbbo,equities,equity-quotes` | Replay stream selection. |
| `REPLAY_START_TS` | `0` | Replay lower-bound timestamp. |
| `REPLAY_END_TS` | `0` | Replay upper-bound timestamp. |
| `REPLAY_SPEED` | `1` | Replay speed multiplier. |
| `REPLAY_BATCH_SIZE` | `200` | Batch fetch size per stream. |
| `REPLAY_LOG_EVERY` | `1000` | Progress log interval. |
| `TESTING_MODE` | `false` | Enables ingest publish throttling for deterministic/lower-volume test runs. |
| `TESTING_THROTTLE_MS` | `200` | Minimum delay between emitted events while `TESTING_MODE=true`. |
## Quick Notes
- Python dependencies are required only for IBKR/Databento sidecars (`services/ingest-options/py/requirements.txt`).
- Python dependencies are required only for IBKR/Databento sidecars: `services/ingest-options/py/requirements.txt`.
- Candle construction is server-side; the client consumes prebuilt OHLC events.
- Option prints now persist as enriched raw rows and can be queried as either:
- `view=signal` — default live/UI path and compute input.
- `view=raw` — audit/debug path that preserves every stored print.
- The default Tape page options/packets posture is now stock-only, hides `B` / `BB`, keeps calls and puts visible, and applies in-memory min-notional controls immediately.
- Live retention uses a two-tier model:
- ClickHouse is durable server history; Redis is a bounded hot cache per live generic channel.
- `LIVE_LIMIT_*` controls initial snapshot/hot-cache depth, not total persisted history.
- Browser state is only a rendering window and UI preferences, not a market-data database.
- Devices connected to the same API hydrate from the same server-seen history.
- UI keeps a bounded hot window for rendering performance around the signal view rather than raw noise.
- Options prints can use a deeper dedicated cap via `NEXT_PUBLIC_LIVE_HOT_WINDOW_OPTIONS` without raising every other feed.
- Alert/drawer evidence is pinned and hydrated by id/trace so details remain inspectable after hot-window eviction.
- Firehose-readiness strategy:
- preserve raw ingest for storage/replay,
- feed compute and default live UI from the filtered signal path,
- add filterable live subscription contracts now so selective delivery can move server-side without reshaping the protocol later.
- Option prints persist as enriched raw rows and can be queried as `view=signal` or `view=raw`.
- The default Tape page options/packets posture is stock-only, hides `B` / `BB`, keeps calls and puts visible, and applies in-memory min-notional controls immediately.
- Live retention uses ClickHouse for durable server history, Redis for bounded hot cache, and browser state for rendering windows/preferences.
- Alert and drawer evidence is pinned and hydrated by id/trace so details remain inspectable after hot-window eviction.
- Firehose readiness keeps raw ingest for storage/replay, routes default compute/UI through filtered signals, and keeps subscription contracts ready for server-side selective delivery.
- This repository is for personal, non-redistributed usage.
## Useful Examples