From e54fa4b2deb14e9261b3270722dd84a20b02982f Mon Sep 17 00:00:00 2001 From: dirtydishes Date: Fri, 22 May 2026 21:39:55 -0400 Subject: [PATCH 1/5] cap live api caches and cut redis churn --- .beads/issues.jsonl | 1 + .env.example | 3 +- apps/web/app/terminal.tsx | 4 +- deployment/docker/.env.example | 3 +- services/api/src/index.ts | 58 ++++++++ services/api/src/live.ts | 236 +++++++++++++++++++++++++++----- services/api/tests/live.test.ts | 40 +++++- 7 files changed, 304 insertions(+), 41 deletions(-) diff --git a/.beads/issues.jsonl b/.beads/issues.jsonl index 36cf3df..c6b5525 100644 --- a/.beads/issues.jsonl +++ b/.beads/issues.jsonl @@ -1,3 +1,4 @@ +{"_type":"issue","id":"islandflow-thp","title":"stabilize live api memory and reduce internal cache churn","description":"The native VPS deployment is repeatedly OOM-killing islandflow-api.service during live operation. The API live cache is retaining oversized channel histories and rewriting large Redis lists on every flush, which drives multi-GB Bun RSS and heavy loopback traffic between the API, Redis, NATS, and ClickHouse. Implement an emergency VPS mitigation plus repo hardening so unsafe env values, reconnect snapshots, and Redis persistence patterns cannot push the live API back into OOM.","acceptance_criteria":"1. VPS live cache env values are reduced to safe defaults and live redis state is cleared before restart. 2. services/api/src/live.ts enforces server-side live cache caps and clamps snapshot_limit accordingly. 3. Hot generic feed Redis persistence no longer rewrites entire lists on every flush. 4. Metrics/logging expose subscription counts, snapshot sizes, redis flush volume, and API memory trend. 5. Relevant tests pass and the deployment is restarted successfully.","notes":"Implemented local hardening for API live-state limits, incremental generic Redis persistence, live subscription/memory metrics, and safer client/env defaults. Targeted API live tests and the web production build both passed.","status":"in_progress","priority":1,"issue_type":"bug","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-05-23T01:30:43Z","created_by":"dirtydishes","updated_at":"2026-05-23T01:39:57Z","started_at":"2026-05-23T01:30:52Z","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-sc6","title":"fix electron codex bridge preload loading","description":"Electron settings showed the browser-only Desktop Required fallback because the renderer did not see the native islandflowDesktop preload bridge or an Electron user-agent marker. Fix the desktop launch path so ChatGPT/Codex subscription controls are available inside Islandflow Desktop again.","notes":"Reopened after live Electron still showed the browser-only fallback. Follow-up fix adds an explicit preload runtime marker and web runtime detection for that marker so Electron is recognized even when the bridge is not ready and the user agent lacks an Electron token.","status":"closed","priority":1,"issue_type":"bug","owner":"dishes@dpdrm.com","created_at":"2026-05-20T23:42:58Z","created_by":"dirtydishes","updated_at":"2026-05-20T23:51:43Z","closed_at":"2026-05-20T23:51:43Z","close_reason":"Follow-up fix added an explicit islandflowDesktopRuntime preload marker and taught the web runtime to recognize that marker plus IslandflowDesktop user-agent tokens, so Electron no longer falls into the browser-only fallback when the AI bridge is delayed or unavailable. Desktop build and focused desktop/web tests pass; full web build still blocked by islandflow-c8f.","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-hj3","title":"Fix Electron preload for desktop AI bridge","description":"## Why\\nThe desktop settings page reports the native AI bridge as unavailable because Electron fails to load the preload script in local dev.\\n\\n## What\\nUpdate the desktop preload implementation/build so Electron can execute it, restore window.islandflowDesktop, and verify the Copilot settings panel detects the bridge again.\\n\\n## Acceptance Criteria\\n- Electron no longer logs a preload syntax error\\n- window.islandflowDesktop is available in the desktop renderer\\n- The settings page no longer shows bridge unavailable solely because preload failed\\n- Relevant desktop/web tests pass","status":"closed","priority":1,"issue_type":"bug","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-05-20T23:16:39Z","created_by":"dirtydishes","updated_at":"2026-05-20T23:20:20Z","started_at":"2026-05-20T23:16:48Z","closed_at":"2026-05-20T23:20:20Z","close_reason":"Closed","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-199","title":"fix desktop copilot fallback inside electron","description":"## Why\\nThe settings page can render the browser-only fallback even when Islandflow is running inside the Electron desktop shell.\\n\\n## What\\nSeparate desktop-shell detection from desktop AI transport state, make the provider recover if the bridge appears late or initial state loading fails, and cover the regression with tests.\\n\\n## Acceptance Criteria\\n- The desktop shell no longer shows the browser-only fallback solely because initial bridge state failed or arrived late\\n- Desktop-only actions can distinguish between missing Electron bridge and transport/auth problems\\n- Automated tests cover the recovery behavior","status":"closed","priority":1,"issue_type":"bug","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-05-20T22:30:16Z","created_by":"dirtydishes","updated_at":"2026-05-20T22:37:21Z","started_at":"2026-05-20T22:30:23Z","closed_at":"2026-05-20T22:37:21Z","close_reason":"Fixed desktop-shell Copilot fallback handling, added bridge recovery logic, updated desktop-vs-bridge UI messaging, and added regression tests. Follow-up tracked in islandflow-c8f for unrelated web build blocker.","dependency_count":0,"dependent_count":0,"comment_count":0} diff --git a/.env.example b/.env.example index be20b62..2d36859 100644 --- a/.env.example +++ b/.env.example @@ -106,7 +106,7 @@ REPLAY_LOG_EVERY=1000 # API live retention (generic channels) LIVE_LIMIT_DEFAULT=1000 -LIVE_LIMIT_OPTIONS=1000 +LIVE_LIMIT_OPTIONS=100 LIVE_LIMIT_NBBO=1000 LIVE_LIMIT_EQUITIES=1000 LIVE_LIMIT_EQUITY_QUOTES=500 @@ -116,6 +116,7 @@ LIVE_LIMIT_SMART_MONEY=300 LIVE_LIMIT_CLASSIFIER_HITS=300 LIVE_LIMIT_ALERTS=300 LIVE_LIMIT_INFERRED_DARK=300 +LIVE_LIMIT_NEWS=100 LIVE_SCOPED_CACHE_MAX_KEYS=32 LIVE_REDIS_FLUSH_INTERVAL_MS=250 LIVE_REDIS_FLUSH_MAX_ITEMS=100 diff --git a/apps/web/app/terminal.tsx b/apps/web/app/terminal.tsx index 3057f58..5bf1641 100644 --- a/apps/web/app/terminal.tsx +++ b/apps/web/app/terminal.tsx @@ -72,12 +72,12 @@ const parseBoundedInt = ( return Math.max(min, Math.min(max, Math.floor(parsed))); }; -const LIVE_HOT_WINDOW = parseBoundedInt(process.env.NEXT_PUBLIC_LIVE_HOT_WINDOW, 600, 1, 100000); +const LIVE_HOT_WINDOW = parseBoundedInt(process.env.NEXT_PUBLIC_LIVE_HOT_WINDOW, 600, 1, 2000); const LIVE_HOT_WINDOW_OPTIONS = parseBoundedInt( process.env.NEXT_PUBLIC_LIVE_HOT_WINDOW_OPTIONS, 1200, 1, - 100000 + 2000 ); const LIVE_OPTIONS_HEAD_LIMIT = 100; const LIVE_HISTORY_SOFT_CAP = parseBoundedInt( diff --git a/deployment/docker/.env.example b/deployment/docker/.env.example index 4972ada..ff4d7f3 100644 --- a/deployment/docker/.env.example +++ b/deployment/docker/.env.example @@ -132,7 +132,7 @@ REPLAY_LOG_EVERY=1000 # API live retention LIVE_LIMIT_DEFAULT=1000 -LIVE_LIMIT_OPTIONS=1000 +LIVE_LIMIT_OPTIONS=100 LIVE_LIMIT_NBBO=1000 LIVE_LIMIT_EQUITIES=1000 LIVE_LIMIT_EQUITY_QUOTES=500 @@ -142,6 +142,7 @@ LIVE_LIMIT_SMART_MONEY=300 LIVE_LIMIT_CLASSIFIER_HITS=300 LIVE_LIMIT_ALERTS=300 LIVE_LIMIT_INFERRED_DARK=300 +LIVE_LIMIT_NEWS=100 LIVE_SCOPED_CACHE_MAX_KEYS=32 LIVE_REDIS_FLUSH_INTERVAL_MS=250 LIVE_REDIS_FLUSH_MAX_ITEMS=100 diff --git a/services/api/src/index.ts b/services/api/src/index.ts index 562fb6b..daa013c 100644 --- a/services/api/src/index.ts +++ b/services/api/src/index.ts @@ -307,6 +307,35 @@ const subscriptionSockets = new Map>(); const subscriptionDefinitions = new Map(); const liveHeartbeats = new Map>(); +const buildLiveSubscriptionMetrics = (): { + liveSocketCount: number; + uniqueSubscriptionsByChannel: Partial>; + socketFanoutByChannel: Partial>; +} => { + const uniqueSubscriptionsByChannel: Partial> = {}; + const socketFanoutByChannel: Partial> = {}; + + for (const subscription of subscriptionDefinitions.values()) { + uniqueSubscriptionsByChannel[subscription.channel] = + (uniqueSubscriptionsByChannel[subscription.channel] ?? 0) + 1; + } + + for (const [key, sockets] of subscriptionSockets.entries()) { + const subscription = subscriptionDefinitions.get(key); + if (!subscription || sockets.size === 0) { + continue; + } + socketFanoutByChannel[subscription.channel] = + (socketFanoutByChannel[subscription.channel] ?? 0) + sockets.size; + } + + return { + liveSocketCount: liveSocketSubscriptions.size, + uniqueSubscriptionsByChannel, + socketFanoutByChannel + }; +}; + const jsonResponse = (body: unknown, status = 200): Response => { return new Response(JSON.stringify(body), { status, @@ -759,6 +788,8 @@ const run = async () => { const liveState = new LiveStateManager(clickhouse, redis, resolveLiveStateConfig()); await liveState.hydrate(); + let previousLiveStats = liveState.getStatsSnapshot(); + let previousMemoryUsage = process.memoryUsage(); const warnLiveLag = ( channel: keyof typeof HOT_LIVE_REDIS_KEYS, ageMs: number | null | undefined @@ -778,25 +809,52 @@ const run = async () => { const liveStateMetricsTimer = setInterval(() => { const snapshot = liveState.getStatsSnapshot(); const hotFeedHealth = liveState.getHotChannelHealth(); + const subscriptionMetrics = buildLiveSubscriptionMetrics(); + const memoryUsage = process.memoryUsage(); const hotFeedLagMs = { options: snapshot.freshnessAgeMsByKey[HOT_LIVE_REDIS_KEYS.options] ?? null, equities: snapshot.freshnessAgeMsByKey[HOT_LIVE_REDIS_KEYS.equities] ?? null, flow: snapshot.freshnessAgeMsByKey[HOT_LIVE_REDIS_KEYS.flow] ?? null, nbbo: snapshot.freshnessAgeMsByKey[HOT_LIVE_REDIS_KEYS.nbbo] ?? null }; + const flushDelta = { + redisFlushCount: snapshot.redisFlushCount - previousLiveStats.redisFlushCount, + redisFlushItems: snapshot.redisFlushItems - previousLiveStats.redisFlushItems, + redisFlushPayloadBytes: snapshot.redisFlushPayloadBytes - previousLiveStats.redisFlushPayloadBytes + }; + const memorySnapshot = { + rss_bytes: memoryUsage.rss, + heap_used_bytes: memoryUsage.heapUsed, + heap_total_bytes: memoryUsage.heapTotal, + external_bytes: memoryUsage.external, + array_buffers_bytes: memoryUsage.arrayBuffers, + rss_delta_bytes: memoryUsage.rss - previousMemoryUsage.rss, + heap_used_delta_bytes: memoryUsage.heapUsed - previousMemoryUsage.heapUsed + }; logger.info("live cache metrics", { ...snapshot, hotFeedLagMs, hotFeedHealth, + flushDelta, + memorySnapshot, + liveSubscriptions: subscriptionMetrics, snapshotSourceCounts: { generic_cache_snapshot: snapshot.genericCacheSnapshots, scoped_clickhouse_snapshot: snapshot.scopedClickHouseSnapshots } }); + metrics.gauge("api.memory.rss_bytes", memoryUsage.rss); + metrics.gauge("api.memory.heap_used_bytes", memoryUsage.heapUsed); + metrics.gauge("api.live.active_sockets", subscriptionMetrics.liveSocketCount); + for (const [channel, count] of Object.entries(subscriptionMetrics.uniqueSubscriptionsByChannel)) { + metrics.gauge("api.live.subscription_count", count, { channel }); + } warnLiveLag("options", hotFeedLagMs.options); warnLiveLag("equities", hotFeedLagMs.equities); warnLiveLag("flow", hotFeedLagMs.flow); warnLiveLag("nbbo", hotFeedLagMs.nbbo); + previousLiveStats = snapshot; + previousMemoryUsage = memoryUsage; }, 60000); const consumerBindings = [ diff --git a/services/api/src/live.ts b/services/api/src/live.ts index c8d2886..9687eec 100644 --- a/services/api/src/live.ts +++ b/services/api/src/live.ts @@ -89,6 +89,20 @@ const DEFAULT_LIVE_LIMITS: GenericLiveLimits = { news: 100 }; +export const LIVE_GENERIC_LIMIT_CAPS: GenericLiveLimits = { + options: 100, + nbbo: 1000, + equities: 1000, + "equity-quotes": 500, + "equity-joins": 500, + flow: 500, + "smart-money": 300, + "classifier-hits": 300, + alerts: 300, + "inferred-dark": 300, + news: 100 +}; + const DEFAULT_SCOPED_CACHE_MAX_KEYS = 32; const DEFAULT_REDIS_FLUSH_INTERVAL_MS = 250; const DEFAULT_REDIS_FLUSH_MAX_ITEMS = 100; @@ -134,7 +148,7 @@ const parseGenericLimit = ( const key = GENERIC_LIMIT_ENV_KEYS[channel]; const raw = env[key]; if (!raw || raw.trim().length === 0) { - return fallback; + return clampConfiguredLimit(channel, fallback); } const parsed = Number(raw); @@ -143,7 +157,7 @@ const parseGenericLimit = ( return fallback; } - const bounded = Math.max(MIN_GENERIC_LIMIT, Math.min(MAX_GENERIC_LIMIT, Math.floor(parsed))); + const bounded = clampConfiguredLimit(channel, Math.min(MAX_GENERIC_LIMIT, parsed)); if (bounded !== parsed) { console.warn(`Clamped ${key} from ${parsed} to ${bounded}`); } @@ -226,7 +240,7 @@ const extractFreshnessTs = (channel: LiveGenericChannel, item: any): number | nu }; export const resolveLiveStateConfig = (env: NodeJS.ProcessEnv = process.env): LiveStateConfig => ({ - limits: resolveGenericLiveLimits(env), + limits: clampGenericLimitMap(resolveGenericLiveLimits(env)), scopedCacheMaxKeys: parsePositiveInt(env.LIVE_SCOPED_CACHE_MAX_KEYS, DEFAULT_SCOPED_CACHE_MAX_KEYS), redisFlushIntervalMs: parsePositiveInt( env.LIVE_REDIS_FLUSH_INTERVAL_MS, @@ -559,7 +573,8 @@ const insertNewestFirst = ( }; }; -type BufferedRedisWrite = { +type BufferedRedisRewrite = { + mode: "rewrite"; listKey: string; cursorField: string; items: unknown[]; @@ -568,9 +583,64 @@ type BufferedRedisWrite = { updates: number; }; +type BufferedRedisAppend = { + mode: "append"; + listKey: string; + cursorField: string; + payloads: string[]; + limit: number; + cursor: Cursor | null; + updates: number; +}; + +type BufferedRedisWrite = BufferedRedisRewrite | BufferedRedisAppend; + +export type LiveStateStatsSnapshot = { + genericHydrateFromRedis: number; + genericHydrateFromClickHouse: number; + genericCacheSnapshots: number; + scopedClickHouseSnapshots: number; + trimOperations: number; + redisFlushCount: number; + redisFlushItems: number; + redisFlushPayloadBytes: number; + cacheEvictions: number; + outOfOrderEvents: number; + cacheDepthByKey: Record; + freshnessAgeMsByKey: Record; + snapshotItemsByChannel: Record; +}; + const isLiveStateConfig = (value: GenericLiveLimits | LiveStateConfig): value is LiveStateConfig => "limits" in value; +const clampConfiguredLimit = (channel: LiveGenericChannel, value: number): number => + Math.max(MIN_GENERIC_LIMIT, Math.min(LIVE_GENERIC_LIMIT_CAPS[channel], Math.floor(value))); + +const clampGenericLimitMap = (limits: GenericLiveLimits): GenericLiveLimits => + Object.fromEntries( + (Object.keys(LIVE_GENERIC_LIMIT_CAPS) as LiveGenericChannel[]).map((channel) => [ + channel, + clampConfiguredLimit(channel, limits[channel] ?? DEFAULT_LIVE_LIMITS[channel]) + ]) + ) as GenericLiveLimits; + +const normalizeLiveStateConfig = (config: GenericLiveLimits | LiveStateConfig): LiveStateConfig => { + if (isLiveStateConfig(config)) { + return { + ...config, + limits: clampGenericLimitMap(config.limits) + }; + } + + return { + limits: clampGenericLimitMap(config), + scopedCacheMaxKeys: DEFAULT_SCOPED_CACHE_MAX_KEYS, + redisFlushIntervalMs: DEFAULT_REDIS_FLUSH_INTERVAL_MS, + redisFlushMaxItems: DEFAULT_REDIS_FLUSH_MAX_ITEMS + }; +}; + export class LiveStateManager { private readonly config: LiveStateConfig; private readonly generic: { @@ -594,10 +664,12 @@ export class LiveStateManager { trimOperations: 0, redisFlushCount: 0, redisFlushItems: 0, + redisFlushPayloadBytes: 0, cacheEvictions: 0, outOfOrderEvents: 0, cacheDepthByKey: new Map(), - freshnessAgeMsByKey: new Map() + freshnessAgeMsByKey: new Map(), + snapshotItemsByChannel: new Map() }; constructor( @@ -605,14 +677,7 @@ export class LiveStateManager { private readonly redis: RedisLike | null, config: GenericLiveLimits | LiveStateConfig = resolveLiveStateConfig() ) { - this.config = isLiveStateConfig(config) - ? config - : { - limits: config, - scopedCacheMaxKeys: DEFAULT_SCOPED_CACHE_MAX_KEYS, - redisFlushIntervalMs: DEFAULT_REDIS_FLUSH_INTERVAL_MS, - redisFlushMaxItems: DEFAULT_REDIS_FLUSH_MAX_ITEMS - }; + this.config = normalizeLiveStateConfig(config); this.generic = getGenericConfig(this.config.limits); this.redisFlushTimer = this.redis && this.redis.isOpen @@ -630,19 +695,7 @@ export class LiveStateManager { await this.flushRedisWrites(); } - getStatsSnapshot(): { - genericHydrateFromRedis: number; - genericHydrateFromClickHouse: number; - genericCacheSnapshots: number; - scopedClickHouseSnapshots: number; - trimOperations: number; - redisFlushCount: number; - redisFlushItems: number; - cacheEvictions: number; - outOfOrderEvents: number; - cacheDepthByKey: Record; - freshnessAgeMsByKey: Record; - } { + getStatsSnapshot(): LiveStateStatsSnapshot { return { genericHydrateFromRedis: this.stats.genericHydrateFromRedis, genericHydrateFromClickHouse: this.stats.genericHydrateFromClickHouse, @@ -651,10 +704,12 @@ export class LiveStateManager { trimOperations: this.stats.trimOperations, redisFlushCount: this.stats.redisFlushCount, redisFlushItems: this.stats.redisFlushItems, + redisFlushPayloadBytes: this.stats.redisFlushPayloadBytes, cacheEvictions: this.stats.cacheEvictions, outOfOrderEvents: this.stats.outOfOrderEvents, cacheDepthByKey: Object.fromEntries(this.stats.cacheDepthByKey), - freshnessAgeMsByKey: Object.fromEntries(this.stats.freshnessAgeMsByKey) + freshnessAgeMsByKey: Object.fromEntries(this.stats.freshnessAgeMsByKey), + snapshotItemsByChannel: Object.fromEntries(this.stats.snapshotItemsByChannel) }; } @@ -676,11 +731,36 @@ export class LiveStateManager { this.pendingRedisWrites.clear(); for (const write of writes) { - await this.persistList(write.listKey, write.cursorField, write.items, write.limit, write.cursor); + if (write.mode === "rewrite") { + await this.persistList(write.listKey, write.cursorField, write.items, write.limit, write.cursor); + this.stats.redisFlushItems += write.items.length; + this.stats.redisFlushPayloadBytes += write.items.reduce( + (total, item) => total + JSON.stringify(item).length, + 0 + ); + } else { + await this.persistListAppend( + write.listKey, + write.cursorField, + write.payloads, + write.limit, + write.cursor + ); + this.stats.redisFlushItems += write.payloads.length; + this.stats.redisFlushPayloadBytes += write.payloads.reduce((total, payload) => total + payload.length, 0); + } this.stats.redisFlushCount += 1; - this.stats.redisFlushItems += write.items.length; metrics.count("api.live.redis_flush_count", 1); - metrics.count("api.live.redis_flush_items", write.items.length); + metrics.count( + "api.live.redis_flush_items", + write.mode === "rewrite" ? write.items.length : write.payloads.length + ); + metrics.count( + "api.live.redis_flush_payload_bytes", + write.mode === "rewrite" + ? write.items.reduce((total, item) => total + JSON.stringify(item).length, 0) + : write.payloads.reduce((total, payload) => total + payload.length, 0) + ); } } @@ -739,7 +819,12 @@ export class LiveStateManager { } } - private queueRedisWrite( + private recordSnapshotItems(channel: LiveSubscription["channel"], count: number): void { + this.stats.snapshotItemsByChannel.set(channel, count); + metrics.gauge("api.live.snapshot_items", count, { channel }); + } + + private queueRedisRewrite( listKey: string, cursorField: string, items: unknown[], @@ -751,7 +836,8 @@ export class LiveStateManager { } const existing = this.pendingRedisWrites.get(listKey); - const write: BufferedRedisWrite = { + const write: BufferedRedisRewrite = { + mode: "rewrite", listKey, cursorField, items: [...items], @@ -765,6 +851,51 @@ export class LiveStateManager { } } + private queueGenericRedisWrite( + listKey: string, + cursorField: string, + item: unknown, + items: unknown[], + limit: number, + cursor: Cursor | null, + forceRewrite = false + ): void { + if (!this.redis?.isOpen) { + return; + } + + const existing = this.pendingRedisWrites.get(listKey); + const nextUpdateCount = (existing?.updates ?? 0) + 1; + if (forceRewrite || existing?.mode === "rewrite") { + const write: BufferedRedisRewrite = { + mode: "rewrite", + listKey, + cursorField, + items: [...items], + limit, + cursor, + updates: nextUpdateCount + }; + this.pendingRedisWrites.set(listKey, write); + } else { + const payload = JSON.stringify(item); + const write: BufferedRedisAppend = { + mode: "append", + listKey, + cursorField, + payloads: [...(existing?.mode === "append" ? existing.payloads : []), payload], + limit, + cursor, + updates: nextUpdateCount + }; + this.pendingRedisWrites.set(listKey, write); + } + + if (nextUpdateCount >= this.config.redisFlushMaxItems) { + void this.flushRedisWrites(); + } + } + async hydrate(): Promise { const channels = Object.keys(this.generic) as LiveGenericChannel[]; await Promise.all(channels.map((channel) => this.hydrateGeneric(channel))); @@ -818,6 +949,7 @@ export class LiveStateManager { const backfill = await fetchRecentOptionPrints(this.clickhouse, limit, undefined, storageFilters); items = mergeSnapshotBackfill(cached, backfill, limit, (entry) => ({ ts: entry.ts, seq: entry.seq })); } + this.recordSnapshotItems(subscription.channel, items.length); return { subscription, items, @@ -830,6 +962,7 @@ export class LiveStateManager { const items = (this.genericItems.get("options") ?? []) .filter((entry) => matchesOptionPrintFilters(entry, subscription.filters)) .slice(0, limit); + this.recordSnapshotItems(subscription.channel, items.length); return { subscription, items, @@ -844,6 +977,7 @@ export class LiveStateManager { const items = (this.genericItems.get("flow") ?? []) .filter((entry) => matchesFlowPacketFilters(entry, subscription.filters)) .slice(0, limit); + this.recordSnapshotItems(subscription.channel, items.length); return { subscription, items, @@ -865,6 +999,7 @@ export class LiveStateManager { const backfill = await fetchRecentEquityPrints(this.clickhouse, limit, filters); items = mergeSnapshotBackfill(cached, backfill, limit, config.cursor); } + this.recordSnapshotItems(subscription.channel, items.length); return { subscription, items, @@ -874,6 +1009,7 @@ export class LiveStateManager { } this.stats.genericCacheSnapshots += 1; const items = (this.genericItems.get("equities") ?? []).slice(0, limit); + this.recordSnapshotItems(subscription.channel, items.length); return { subscription, items, @@ -889,6 +1025,7 @@ export class LiveStateManager { } this.touchAccess(this.candleAccess, key); const items = this.candleItems.get(key) ?? []; + this.recordSnapshotItems(subscription.channel, items.length); return { subscription, items, @@ -904,6 +1041,7 @@ export class LiveStateManager { } this.touchAccess(this.overlayAccess, key); const items = this.overlayItems.get(key) ?? []; + this.recordSnapshotItems(subscription.channel, items.length); return { subscription, items, @@ -916,6 +1054,7 @@ export class LiveStateManager { this.stats.genericCacheSnapshots += 1; const limit = snapshotLimitFor(subscription, config.limit); const items = (this.genericItems.get(subscription.channel) ?? []).slice(0, limit); + this.recordSnapshotItems(subscription.channel, items.length); return { subscription, items, @@ -951,7 +1090,7 @@ export class LiveStateManager { if (nextState.items.length > 0) { this.updateFreshnessMetric(key, "equity-candles", nextState.items[0]); } - this.queueRedisWrite(key, cursorField, nextState.items, CHART_LIMITS.candles, cursor); + this.queueRedisRewrite(key, cursorField, nextState.items, CHART_LIMITS.candles, cursor); return cursor; } case "equity-overlay": { @@ -977,7 +1116,7 @@ export class LiveStateManager { if (nextState.items.length > 0) { this.updateFreshnessMetric(key, "equity-overlay", nextState.items[0]); } - this.queueRedisWrite(key, cursorField, nextState.items, CHART_LIMITS.overlay, cursor); + this.queueRedisRewrite(key, cursorField, nextState.items, CHART_LIMITS.overlay, cursor); return cursor; } default: { @@ -1007,7 +1146,15 @@ export class LiveStateManager { if (nextState.items.length > 0) { this.updateFreshnessMetric(config.redisKey, channel, nextState.items[0]); } - this.queueRedisWrite(config.redisKey, config.cursorField, nextState.items, config.limit, cursor); + this.queueGenericRedisWrite( + config.redisKey, + config.cursorField, + parsed, + nextState.items, + config.limit, + cursor, + nextState.outOfOrder + ); return cursor; } } @@ -1102,4 +1249,23 @@ export class LiveStateManager { this.stats.cacheDepthByKey.set(listKey, Math.min(items.length, limit)); await this.redis.hSet(CURSOR_HASH_KEY, cursorField, JSON.stringify(cursor)); } + + private async persistListAppend( + listKey: string, + cursorField: string, + payloads: string[], + limit: number, + cursor: Cursor | null + ): Promise { + if (!this.redis?.isOpen) { + return; + } + + for (const payload of payloads) { + await this.redis.lPush(listKey, payload); + } + await this.redis.lTrim(listKey, 0, limit - 1); + this.stats.trimOperations += 1; + await this.redis.hSet(CURSOR_HASH_KEY, cursorField, JSON.stringify(cursor)); + } } diff --git a/services/api/tests/live.test.ts b/services/api/tests/live.test.ts index 78807ca..a62fe3b 100644 --- a/services/api/tests/live.test.ts +++ b/services/api/tests/live.test.ts @@ -27,6 +27,7 @@ const makeClickHouse = ( const makeRedis = () => { const lists = new Map(); const hashes = new Map>(); + let clearTrimCount = 0; return { isOpen: true, @@ -41,6 +42,9 @@ const makeRedis = () => { }, async lTrim(key: string, start: number, stop: number) { const next = lists.get(key) ?? []; + if (start > stop) { + clearTrimCount += 1; + } lists.set(key, start > stop ? [] : next.slice(start, stop + 1)); return "OK"; }, @@ -52,6 +56,9 @@ const makeRedis = () => { hash.set(field, value); hashes.set(key, hash); return 1; + }, + getClearTrimCount() { + return clearTrimCount; } }; }; @@ -64,8 +71,8 @@ describe("LiveStateManager", () => { LIVE_LIMIT_FLOW: "bad" } as NodeJS.ProcessEnv); - expect(limits.options).toBe(777); - expect(limits.nbbo).toBe(100000); + expect(limits.options).toBe(100); + expect(limits.nbbo).toBe(1000); expect(limits.flow).toBe(500); expect(limits["equity-quotes"]).toBe(500); expect(limits.alerts).toBe(300); @@ -209,11 +216,13 @@ describe("LiveStateManager", () => { const flushed = await redis.lRange("live:flow", 0, 99); expect(persisted).toHaveLength(0); expect(flushed).toHaveLength(2); + expect(redis.getClearTrimCount()).toBe(0); const stats = manager.getStatsSnapshot(); expect(stats.trimOperations).toBeGreaterThan(0); expect(stats.redisFlushCount).toBeGreaterThan(0); expect(stats.cacheDepthByKey["live:flow"]).toBe(2); + expect(stats.redisFlushPayloadBytes).toBeGreaterThan(0); }); it("reorders out-of-order live events without dropping newest-first semantics", async () => { @@ -1074,6 +1083,33 @@ describe("LiveStateManager", () => { expect(stats.scopedClickHouseSnapshots).toBe(1); }); + it("clamps oversized snapshot requests to the server-side channel cap", async () => { + const manager = new LiveStateManager(makeClickHouse(), null); + const now = Date.now(); + + for (let idx = 0; idx < 120; idx += 1) { + await manager.ingest("options", { + source_ts: now + idx, + ingest_ts: now + idx + 1, + seq: idx + 1, + trace_id: `opt-${idx + 1}`, + ts: now + idx, + option_contract_id: `SPY-2025-01-17-${500 + idx}-C`, + price: 1, + size: 10, + exchange: "X" + }); + } + + const snapshot = await manager.getSnapshot({ + channel: "options", + snapshot_limit: 10_000 + }); + + expect(snapshot.items).toHaveLength(100); + expect(manager.getStatsSnapshot().snapshotItemsByChannel.options).toBe(100); + }); + it("keeps backend channel health healthy when a scoped query is quiet", async () => { const manager = new LiveStateManager(makeClickHouse(() => []), null); const now = Date.now(); -- 2.49.1 From 20397fdef37e03fb170aa48373b9df8f29897536 Mon Sep 17 00:00:00 2001 From: dirtydishes Date: Fri, 22 May 2026 21:42:55 -0400 Subject: [PATCH 2/5] serialize redis flushes during api shutdown --- services/api/src/live.ts | 50 +++++++++++++++++++++++++++++++++------- 1 file changed, 42 insertions(+), 8 deletions(-) diff --git a/services/api/src/live.ts b/services/api/src/live.ts index 9687eec..2df3969 100644 --- a/services/api/src/live.ts +++ b/services/api/src/live.ts @@ -614,6 +614,9 @@ export type LiveStateStatsSnapshot = { const isLiveStateConfig = (value: GenericLiveLimits | LiveStateConfig): value is LiveStateConfig => "limits" in value; +const isRedisClientClosedError = (error: unknown): boolean => + error instanceof Error && error.message.toLowerCase().includes("client is closed"); + const clampConfiguredLimit = (channel: LiveGenericChannel, value: number): number => Math.max(MIN_GENERIC_LIMIT, Math.min(LIVE_GENERIC_LIMIT_CAPS[channel], Math.floor(value))); @@ -656,6 +659,7 @@ export class LiveStateManager { private readonly overlayAccess = new Map(); private readonly pendingRedisWrites = new Map(); private readonly redisFlushTimer: ReturnType | null; + private redisFlushInFlight: Promise | null = null; private readonly stats = { genericHydrateFromRedis: 0, genericHydrateFromClickHouse: 0, @@ -723,6 +727,22 @@ export class LiveStateManager { } async flushRedisWrites(): Promise { + if (this.redisFlushInFlight) { + return this.redisFlushInFlight; + } + + this.redisFlushInFlight = this.flushRedisWritesInternal(); + try { + await this.redisFlushInFlight; + } finally { + this.redisFlushInFlight = null; + if (this.pendingRedisWrites.size > 0 && this.redis?.isOpen) { + void this.flushRedisWrites(); + } + } + } + + private async flushRedisWritesInternal(): Promise { if (!this.redis?.isOpen) { return; } @@ -732,20 +752,34 @@ export class LiveStateManager { for (const write of writes) { if (write.mode === "rewrite") { - await this.persistList(write.listKey, write.cursorField, write.items, write.limit, write.cursor); + try { + await this.persistList(write.listKey, write.cursorField, write.items, write.limit, write.cursor); + } catch (error) { + if (isRedisClientClosedError(error)) { + return; + } + throw error; + } this.stats.redisFlushItems += write.items.length; this.stats.redisFlushPayloadBytes += write.items.reduce( (total, item) => total + JSON.stringify(item).length, 0 ); } else { - await this.persistListAppend( - write.listKey, - write.cursorField, - write.payloads, - write.limit, - write.cursor - ); + try { + await this.persistListAppend( + write.listKey, + write.cursorField, + write.payloads, + write.limit, + write.cursor + ); + } catch (error) { + if (isRedisClientClosedError(error)) { + return; + } + throw error; + } this.stats.redisFlushItems += write.payloads.length; this.stats.redisFlushPayloadBytes += write.payloads.reduce((total, payload) => total + payload.length, 0); } -- 2.49.1 From 5a68a3e38e590fe9fca59783e1506652c79a0d44 Mon Sep 17 00:00:00 2001 From: dirtydishes Date: Fri, 22 May 2026 21:50:35 -0400 Subject: [PATCH 3/5] document live api stabilization rollout --- .beads/issues.jsonl | 2 +- .../2026-05-22-stabilize-live-api-memory.html | 810 ++++++++++++++++++ 2 files changed, 811 insertions(+), 1 deletion(-) create mode 100644 docs/turns/2026-05-22-stabilize-live-api-memory.html diff --git a/.beads/issues.jsonl b/.beads/issues.jsonl index c6b5525..2b12057 100644 --- a/.beads/issues.jsonl +++ b/.beads/issues.jsonl @@ -1,4 +1,4 @@ -{"_type":"issue","id":"islandflow-thp","title":"stabilize live api memory and reduce internal cache churn","description":"The native VPS deployment is repeatedly OOM-killing islandflow-api.service during live operation. The API live cache is retaining oversized channel histories and rewriting large Redis lists on every flush, which drives multi-GB Bun RSS and heavy loopback traffic between the API, Redis, NATS, and ClickHouse. Implement an emergency VPS mitigation plus repo hardening so unsafe env values, reconnect snapshots, and Redis persistence patterns cannot push the live API back into OOM.","acceptance_criteria":"1. VPS live cache env values are reduced to safe defaults and live redis state is cleared before restart. 2. services/api/src/live.ts enforces server-side live cache caps and clamps snapshot_limit accordingly. 3. Hot generic feed Redis persistence no longer rewrites entire lists on every flush. 4. Metrics/logging expose subscription counts, snapshot sizes, redis flush volume, and API memory trend. 5. Relevant tests pass and the deployment is restarted successfully.","notes":"Implemented local hardening for API live-state limits, incremental generic Redis persistence, live subscription/memory metrics, and safer client/env defaults. Targeted API live tests and the web production build both passed.","status":"in_progress","priority":1,"issue_type":"bug","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-05-23T01:30:43Z","created_by":"dirtydishes","updated_at":"2026-05-23T01:39:57Z","started_at":"2026-05-23T01:30:52Z","dependency_count":0,"dependent_count":0,"comment_count":0} +{"_type":"issue","id":"islandflow-thp","title":"stabilize live api memory and reduce internal cache churn","description":"The native VPS deployment is repeatedly OOM-killing islandflow-api.service during live operation. The API live cache is retaining oversized channel histories and rewriting large Redis lists on every flush, which drives multi-GB Bun RSS and heavy loopback traffic between the API, Redis, NATS, and ClickHouse. Implement an emergency VPS mitigation plus repo hardening so unsafe env values, reconnect snapshots, and Redis persistence patterns cannot push the live API back into OOM.","acceptance_criteria":"1. VPS live cache env values are reduced to safe defaults and live redis state is cleared before restart. 2. services/api/src/live.ts enforces server-side live cache caps and clamps snapshot_limit accordingly. 3. Hot generic feed Redis persistence no longer rewrites entire lists on every flush. 4. Metrics/logging expose subscription counts, snapshot sizes, redis flush volume, and API memory trend. 5. Relevant tests pass and the deployment is restarted successfully.","notes":"Implemented and deployed the live-state hardening to the VPS. Final validation after restart showed the API around 120 MB RSS with capped live cache depths and clean systemd restarts.","status":"in_progress","priority":1,"issue_type":"bug","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-05-23T01:30:43Z","created_by":"dirtydishes","updated_at":"2026-05-23T01:50:29Z","started_at":"2026-05-23T01:30:52Z","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-sc6","title":"fix electron codex bridge preload loading","description":"Electron settings showed the browser-only Desktop Required fallback because the renderer did not see the native islandflowDesktop preload bridge or an Electron user-agent marker. Fix the desktop launch path so ChatGPT/Codex subscription controls are available inside Islandflow Desktop again.","notes":"Reopened after live Electron still showed the browser-only fallback. Follow-up fix adds an explicit preload runtime marker and web runtime detection for that marker so Electron is recognized even when the bridge is not ready and the user agent lacks an Electron token.","status":"closed","priority":1,"issue_type":"bug","owner":"dishes@dpdrm.com","created_at":"2026-05-20T23:42:58Z","created_by":"dirtydishes","updated_at":"2026-05-20T23:51:43Z","closed_at":"2026-05-20T23:51:43Z","close_reason":"Follow-up fix added an explicit islandflowDesktopRuntime preload marker and taught the web runtime to recognize that marker plus IslandflowDesktop user-agent tokens, so Electron no longer falls into the browser-only fallback when the AI bridge is delayed or unavailable. Desktop build and focused desktop/web tests pass; full web build still blocked by islandflow-c8f.","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-hj3","title":"Fix Electron preload for desktop AI bridge","description":"## Why\\nThe desktop settings page reports the native AI bridge as unavailable because Electron fails to load the preload script in local dev.\\n\\n## What\\nUpdate the desktop preload implementation/build so Electron can execute it, restore window.islandflowDesktop, and verify the Copilot settings panel detects the bridge again.\\n\\n## Acceptance Criteria\\n- Electron no longer logs a preload syntax error\\n- window.islandflowDesktop is available in the desktop renderer\\n- The settings page no longer shows bridge unavailable solely because preload failed\\n- Relevant desktop/web tests pass","status":"closed","priority":1,"issue_type":"bug","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-05-20T23:16:39Z","created_by":"dirtydishes","updated_at":"2026-05-20T23:20:20Z","started_at":"2026-05-20T23:16:48Z","closed_at":"2026-05-20T23:20:20Z","close_reason":"Closed","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-199","title":"fix desktop copilot fallback inside electron","description":"## Why\\nThe settings page can render the browser-only fallback even when Islandflow is running inside the Electron desktop shell.\\n\\n## What\\nSeparate desktop-shell detection from desktop AI transport state, make the provider recover if the bridge appears late or initial state loading fails, and cover the regression with tests.\\n\\n## Acceptance Criteria\\n- The desktop shell no longer shows the browser-only fallback solely because initial bridge state failed or arrived late\\n- Desktop-only actions can distinguish between missing Electron bridge and transport/auth problems\\n- Automated tests cover the recovery behavior","status":"closed","priority":1,"issue_type":"bug","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-05-20T22:30:16Z","created_by":"dirtydishes","updated_at":"2026-05-20T22:37:21Z","started_at":"2026-05-20T22:30:23Z","closed_at":"2026-05-20T22:37:21Z","close_reason":"Fixed desktop-shell Copilot fallback handling, added bridge recovery logic, updated desktop-vs-bridge UI messaging, and added regression tests. Follow-up tracked in islandflow-c8f for unrelated web build blocker.","dependency_count":0,"dependent_count":0,"comment_count":0} diff --git a/docs/turns/2026-05-22-stabilize-live-api-memory.html b/docs/turns/2026-05-22-stabilize-live-api-memory.html new file mode 100644 index 0000000..d2b48e2 --- /dev/null +++ b/docs/turns/2026-05-22-stabilize-live-api-memory.html @@ -0,0 +1,810 @@ + + + + + + Turn Record: Stabilize Live API Memory + + + +
+
+ Turn Record · May 22, 2026 +

Stabilize Live API Memory and Internal Traffic

+

+ The Islandflow live API was repeatedly getting OOM-killed on the VPS because the hot live + cache could retain oversized channel windows and rewrite whole Redis lists at high + frequency. This turn applied an immediate server-side mitigation, hardened the API cache + path in code, and rolled the changes onto the native systemd deployment. +

+
+
+ Branch + stabilize-live-api-memory +
+
+ Beads + islandflow-thp +
+
+ Deployment + Native systemd user services on the VPS +
+
+ Primary Outcome + API RSS returned to roughly 115-130 MB after rollout +
+
+
+ +
+
+

Summary

+

+ The live API is now bounded in three layers instead of trusting environment values and + reconnect behavior. First, the VPS .env was reset to safer live-window + values and the oversized Redis hot-cache keys were cleared. Second, the API now clamps + generic live cache limits per channel in code. Third, generic live feed persistence now + appends deltas into Redis instead of cloning and rewriting entire lists on every flush. +

+
+ Observed on the VPS after rollout: + the API stayed healthy through restart, minute metrics showed much smaller cache depths, + and the kernel did not log any new Bun OOM kill after the hardened restart. +
+
+ +
+

Changes Made

+
    +
  • + Added channel-specific hard caps in + services/api/src/live.ts so oversized + LIVE_LIMIT_* values are clamped before use. +
  • +
  • + Changed generic live Redis persistence from full-list rewrite behavior to append-plus-trim, + with rewrite fallback only when the in-memory ordering has to be rebuilt. +
  • +
  • + Serialized Redis flushes during shutdown so service restarts do not race with a closing + Redis client. +
  • +
  • + Added API minute-log visibility for live subscription counts, Redis flush deltas, + payload bytes, snapshot sizes, and process memory usage. +
  • +
  • + Tightened the browser-exposed live window caps in + apps/web/app/terminal.tsx and aligned the tracked env examples with the safer + production defaults, including LIVE_LIMIT_NEWS. +
  • +
  • + Applied the emergency mitigation directly on the VPS: + updated /home/delta/islandflow/.env, created + /home/delta/islandflow/.env.backup-2026-05-22-2131, deleted stale + live:* Redis keys, rebuilt the web app, and restarted + islandflow-api.service and islandflow-web.service. +
  • +
+
+ +
+

Context

+

+ The VPS was killing islandflow-api.service several times on May 22, 2026. + Kernel logs showed Bun reaching roughly 8-9 GiB RSS inside the API service cgroup before + the OOM killer stepped in. The API minute logs also showed channel depths pinned at + 10000 for multiple feeds, plus massive cumulative Redis rewrite churn. +

+

+ Most of the “huge bandwidth” in btop was local loopback traffic: Bun talking + to Redis, NATS, and ClickHouse on 127.0.0.1. That meant the problem was not a + public-edge flood, it was the live cache architecture multiplying internal work on the box. +

+
+ +
+

Important Implementation Details

+
+
+

API hardening

+
    +
  • + Hard caps now bound generic channel windows even if env values drift upward. +
  • +
  • + snapshot_limit is still honored, but only up to the lower of the request, + the configured limit, and the safe channel cap. +
  • +
  • + Generic feeds use incremental Redis appends; scoped candle and overlay caches still + use full rewrites because they are much smaller and keyed differently. +
  • +
+
+
+

Operational changes

+
    +
  • + The VPS now runs with a much smaller hot live footprint: + options 100, flow 500, alerts 300, + news 100. +
  • +
  • + Old Redis hot-cache keys were deleted so the API did not rehydrate oversized lists on boot. +
  • +
  • + The web app was rebuilt on the VPS checkout after switching that checkout onto + stabilize-live-api-memory. +
  • +
+
+
+
+ +
+

Relevant Diff Snippets

+

+ These snippets are rendered with the Diffs library from + diffs.com, with a plain-text fallback kept inline in the file. +

+
+
+

services/api/src/live.ts: hard caps and append-based generic Redis flushes

+
+
+ Plain-text fallback +
Added LIVE_GENERIC_LIMIT_CAPS, clamped env/configured limits, changed generic writes from
+queueRedisWrite(items:[...items]) to queueGenericRedisWrite(item, items, forceRewrite), and split
+Redis persistence into rewrite and append paths with shutdown-safe flush serialization.
+
+
+ +
+

services/api/src/index.ts: minute metrics now include memory and live subscription visibility

+
+
+ Plain-text fallback +
Added buildLiveSubscriptionMetrics(), previous snapshot tracking, flush delta logging,
+memory snapshots, and gauges for RSS, heap used, active sockets, and per-channel subscriptions.
+
+
+ +
+

.env.example and apps/web/app/terminal.tsx: safer default windows

+
+
+ Plain-text fallback +
Reduced LIVE_LIMIT_OPTIONS in tracked examples to 100, added LIVE_LIMIT_NEWS=100,
+and lowered the client-exposed maximum live hot windows from 100000 to 2000.
+
+
+
+
+ +
+

Expected Impact for End-Users

+
    +
  • + The hosted app should stop disappearing behind API restarts caused by the kernel OOM killer. +
  • +
  • + Live feeds should still feel current, but the server will retain a tighter hot window instead of + hoarding oversized in-memory histories. +
  • +
  • + The operator experience on the VPS should improve because internal loopback churn is materially lower. +
  • +
+
+ +
+

Validation

+
    +
  • + Local API test gate passed: + bun test services/api/tests/live.test.ts +
  • +
  • + Local web production build passed: + bun --cwd=apps/web run build +
  • +
  • + VPS mitigation applied successfully. Redis reported 1524 live keys removed before restart. +
  • +
  • + After mitigation restart, systemctl --user status islandflow-api.service showed the + API at about 84 MB RSS instead of multi-GB startup drift. +
  • +
  • + After rolling the hardened branch onto the VPS, the API minute log at + 2026-05-22 21:44:11 EDT showed: +
  • +
+
+
+ 119.6 MB + API RSS from the minute memory snapshot +
+
+ 100 + live:options depth +
+
+ 500 + live:flow, live:alerts, and live:equity-quotes caps held +
+
+ 34,559 + Redis flush items in that minute delta +
+
+ 9.18 MB + Redis flush payload bytes in that minute delta +
+
+ No new OOM + Kernel logs after the hardened restart +
+
+
+ +
+

Issues, Limitations, and Mitigations

+
    +
  • + The new minute metrics are cumulative plus delta-based. They are much more useful than the old + absolute counters, but they still reset on process restart. +
  • +
  • + snapshotItemsByChannel remains empty when no live websocket clients are connected. + That is expected because snapshots are only recorded when a snapshot is actually served. +
  • +
  • + Quiet feeds such as news and inferred-dark can still show very old freshness ages in logs. + That reflects inactivity, not a broken hot path. +
  • +
  • + The append-based Redis path deliberately falls back to a rewrite when out-of-order live events + require the in-memory ordering to be rebuilt. That keeps correctness ahead of theoretical write minimization. +
  • +
+
+ +
+

Follow-up Work

+
    +
  • + Add explicit alerting for repeated API RSS growth and for minute-level flush deltas that jump far above the new baseline. +
  • +
  • + Decide whether quiet-channel freshness logs should suppress extremely stale values for feeds like news to reduce operator noise. +
  • +
  • + Consider moving the live cache metrics into a dashboard view so operators do not need to parse journal lines manually. +
  • +
+
+
+
+ + + + -- 2.49.1 From db7370052fdd1b783585cd21f36e6f16e4ce00ce Mon Sep 17 00:00:00 2001 From: dirtydishes Date: Fri, 22 May 2026 22:25:02 -0400 Subject: [PATCH 4/5] add an anatomy page for options flow --- .beads/issues.jsonl | 3 +- docs/anatomy.html | 954 ++++++++++++++++++ docs/index.html | 222 +++- ...6-05-22-add-options-anatomy-explainer.html | 584 +++++++++++ 4 files changed, 1711 insertions(+), 52 deletions(-) create mode 100644 docs/anatomy.html create mode 100644 docs/turns/2026-05-22-add-options-anatomy-explainer.html diff --git a/.beads/issues.jsonl b/.beads/issues.jsonl index 2b12057..fb97a9f 100644 --- a/.beads/issues.jsonl +++ b/.beads/issues.jsonl @@ -1,4 +1,4 @@ -{"_type":"issue","id":"islandflow-thp","title":"stabilize live api memory and reduce internal cache churn","description":"The native VPS deployment is repeatedly OOM-killing islandflow-api.service during live operation. The API live cache is retaining oversized channel histories and rewriting large Redis lists on every flush, which drives multi-GB Bun RSS and heavy loopback traffic between the API, Redis, NATS, and ClickHouse. Implement an emergency VPS mitigation plus repo hardening so unsafe env values, reconnect snapshots, and Redis persistence patterns cannot push the live API back into OOM.","acceptance_criteria":"1. VPS live cache env values are reduced to safe defaults and live redis state is cleared before restart. 2. services/api/src/live.ts enforces server-side live cache caps and clamps snapshot_limit accordingly. 3. Hot generic feed Redis persistence no longer rewrites entire lists on every flush. 4. Metrics/logging expose subscription counts, snapshot sizes, redis flush volume, and API memory trend. 5. Relevant tests pass and the deployment is restarted successfully.","notes":"Implemented and deployed the live-state hardening to the VPS. Final validation after restart showed the API around 120 MB RSS with capped live cache depths and clean systemd restarts.","status":"in_progress","priority":1,"issue_type":"bug","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-05-23T01:30:43Z","created_by":"dirtydishes","updated_at":"2026-05-23T01:50:29Z","started_at":"2026-05-23T01:30:52Z","dependency_count":0,"dependent_count":0,"comment_count":0} +{"_type":"issue","id":"islandflow-thp","title":"stabilize live api memory and reduce internal cache churn","description":"The native VPS deployment is repeatedly OOM-killing islandflow-api.service during live operation. The API live cache is retaining oversized channel histories and rewriting large Redis lists on every flush, which drives multi-GB Bun RSS and heavy loopback traffic between the API, Redis, NATS, and ClickHouse. Implement an emergency VPS mitigation plus repo hardening so unsafe env values, reconnect snapshots, and Redis persistence patterns cannot push the live API back into OOM.","acceptance_criteria":"1. VPS live cache env values are reduced to safe defaults and live redis state is cleared before restart. 2. services/api/src/live.ts enforces server-side live cache caps and clamps snapshot_limit accordingly. 3. Hot generic feed Redis persistence no longer rewrites entire lists on every flush. 4. Metrics/logging expose subscription counts, snapshot sizes, redis flush volume, and API memory trend. 5. Relevant tests pass and the deployment is restarted successfully.","notes":"Implemented and deployed the live-state hardening to the VPS. Final validation after restart showed the API around 120 MB RSS with capped live cache depths and clean systemd restarts.","status":"closed","priority":1,"issue_type":"bug","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-05-23T01:30:43Z","created_by":"dirtydishes","updated_at":"2026-05-23T01:50:41Z","started_at":"2026-05-23T01:30:52Z","closed_at":"2026-05-23T01:50:41Z","close_reason":"Closed","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-sc6","title":"fix electron codex bridge preload loading","description":"Electron settings showed the browser-only Desktop Required fallback because the renderer did not see the native islandflowDesktop preload bridge or an Electron user-agent marker. Fix the desktop launch path so ChatGPT/Codex subscription controls are available inside Islandflow Desktop again.","notes":"Reopened after live Electron still showed the browser-only fallback. Follow-up fix adds an explicit preload runtime marker and web runtime detection for that marker so Electron is recognized even when the bridge is not ready and the user agent lacks an Electron token.","status":"closed","priority":1,"issue_type":"bug","owner":"dishes@dpdrm.com","created_at":"2026-05-20T23:42:58Z","created_by":"dirtydishes","updated_at":"2026-05-20T23:51:43Z","closed_at":"2026-05-20T23:51:43Z","close_reason":"Follow-up fix added an explicit islandflowDesktopRuntime preload marker and taught the web runtime to recognize that marker plus IslandflowDesktop user-agent tokens, so Electron no longer falls into the browser-only fallback when the AI bridge is delayed or unavailable. Desktop build and focused desktop/web tests pass; full web build still blocked by islandflow-c8f.","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-hj3","title":"Fix Electron preload for desktop AI bridge","description":"## Why\\nThe desktop settings page reports the native AI bridge as unavailable because Electron fails to load the preload script in local dev.\\n\\n## What\\nUpdate the desktop preload implementation/build so Electron can execute it, restore window.islandflowDesktop, and verify the Copilot settings panel detects the bridge again.\\n\\n## Acceptance Criteria\\n- Electron no longer logs a preload syntax error\\n- window.islandflowDesktop is available in the desktop renderer\\n- The settings page no longer shows bridge unavailable solely because preload failed\\n- Relevant desktop/web tests pass","status":"closed","priority":1,"issue_type":"bug","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-05-20T23:16:39Z","created_by":"dirtydishes","updated_at":"2026-05-20T23:20:20Z","started_at":"2026-05-20T23:16:48Z","closed_at":"2026-05-20T23:20:20Z","close_reason":"Closed","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-199","title":"fix desktop copilot fallback inside electron","description":"## Why\\nThe settings page can render the browser-only fallback even when Islandflow is running inside the Electron desktop shell.\\n\\n## What\\nSeparate desktop-shell detection from desktop AI transport state, make the provider recover if the bridge appears late or initial state loading fails, and cover the regression with tests.\\n\\n## Acceptance Criteria\\n- The desktop shell no longer shows the browser-only fallback solely because initial bridge state failed or arrived late\\n- Desktop-only actions can distinguish between missing Electron bridge and transport/auth problems\\n- Automated tests cover the recovery behavior","status":"closed","priority":1,"issue_type":"bug","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-05-20T22:30:16Z","created_by":"dirtydishes","updated_at":"2026-05-20T22:37:21Z","started_at":"2026-05-20T22:30:23Z","closed_at":"2026-05-20T22:37:21Z","close_reason":"Fixed desktop-shell Copilot fallback handling, added bridge recovery logic, updated desktop-vs-bridge UI messaging, and added regression tests. Follow-up tracked in islandflow-c8f for unrelated web build blocker.","dependency_count":0,"dependent_count":0,"comment_count":0} @@ -72,6 +72,7 @@ {"_type":"issue","id":"islandflow-zs0","title":"Migrate terminal UI to smart-money profiles","description":"Migrate apps/web terminal rendering to consume SmartMoneyEvent directly: primary profile, probability ladder, reason codes, and suppression/abstention state, while preserving legacy alert/classifier displays during the bridge.","status":"closed","priority":2,"issue_type":"task","owner":"dishes@dpdrm.com","created_at":"2026-05-04T21:35:23Z","created_by":"dirtydishes","updated_at":"2026-05-05T05:39:58Z","closed_at":"2026-05-05T05:39:58Z","close_reason":"Completed terminal smart-money profile migration","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-igk","title":"Add plan mode","description":"Implement a user-facing plan mode in the application so users can switch into planning before taking action. Scope to be clarified from existing app patterns.","status":"closed","priority":2,"issue_type":"feature","owner":"dishes@dpdrm.com","created_at":"2026-05-04T04:22:37Z","created_by":"dirtydishes","updated_at":"2026-05-04T04:26:18Z","started_at":"2026-05-04T04:22:40Z","closed_at":"2026-05-04T04:26:18Z","close_reason":"Implemented as a global pi extension toggled with Shift+P","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-biq","title":"Finish raw live options delivery and filter/backpressure observability","description":"The smart-money signal path and Tape filters are in place, but the next firehose pass should finish server-side selective raw live delivery for options subscriptions and add explicit filtered-out/backpressure observability for API/web counters. This was discovered while landing islandflow-e4r.\n","status":"in_progress","priority":2,"issue_type":"task","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-04-28T20:28:58Z","created_by":"dirtydishes","updated_at":"2026-04-29T03:54:12Z","started_at":"2026-04-29T03:54:12Z","dependencies":[{"issue_id":"islandflow-biq","depends_on_id":"islandflow-e4r","type":"discovered-from","created_at":"2026-04-28T16:28:58Z","created_by":"auto-import","metadata":"{}"}],"dependency_count":0,"dependent_count":0,"comment_count":0} +{"_type":"issue","id":"islandflow-hpf","title":"add anatomy explainer for options print and smart money flow","description":"Create a standalone docs/anatomy.html reference page that explains the end-to-end lifecycle of an options print through enrichment, signal filtering, compute clustering, flow packet creation, smart-money evaluation, classifier hits, alerts, and API/live consumption. The page should be polished, user-readable, and visually strong enough to serve as a reusable reference artifact for both technical and non-technical readers.","notes":"Added docs/anatomy.html as a standalone reference page for the options-print to smart-money pipeline, styled in the repo product register and layered for executive, mixed technical, and operator-level readers. Regenerated docs/index.html so the page is discoverable from the docs surface.","status":"closed","priority":3,"issue_type":"task","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-05-23T02:18:48Z","created_by":"dirtydishes","updated_at":"2026-05-23T02:24:58Z","started_at":"2026-05-23T02:18:53Z","closed_at":"2026-05-23T02:24:58Z","close_reason":"Closed","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-4ca","title":"Publish May 21 standup git summary","description":"Create the daily standup-ready git activity summary for 2026-05-21, save the HTML artifact under docs/general, add the required turn document, and push the result so the automation leaves a durable record.","status":"closed","priority":3,"issue_type":"task","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-05-22T13:03:00Z","created_by":"dirtydishes","updated_at":"2026-05-22T13:05:05Z","started_at":"2026-05-22T13:03:03Z","closed_at":"2026-05-22T13:05:05Z","close_reason":"Created the 2026-05-21 standup summary in docs/general, added the required turn document, and prepared the repo for commit/push.","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-hgm","title":"Publish May 20 standup git summary","description":"Create the daily standup-ready git activity summary for 2026-05-20, save the HTML artifact under docs/general, and push the result so the automation leaves a durable record.","status":"closed","priority":3,"issue_type":"task","owner":"dishes@dpdrm.com","created_at":"2026-05-21T13:02:38Z","created_by":"dirtydishes","updated_at":"2026-05-21T13:05:16Z","closed_at":"2026-05-21T13:05:16Z","close_reason":"Closed","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-4q0","title":"refresh readme app description with current classification approach","description":"Update README intro content to better describe the app's current architecture and include a concise explanation of how Islandflow classifies prints, aligned with smartmoney.md and current services.","status":"closed","priority":3,"issue_type":"task","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-05-21T01:53:30Z","created_by":"dirtydishes","updated_at":"2026-05-21T01:55:01Z","started_at":"2026-05-21T01:53:33Z","closed_at":"2026-05-21T01:55:01Z","close_reason":"Closed","dependency_count":0,"dependent_count":0,"comment_count":0} diff --git a/docs/anatomy.html b/docs/anatomy.html new file mode 100644 index 0000000..bdd3da5 --- /dev/null +++ b/docs/anatomy.html @@ -0,0 +1,954 @@ + + + + + + The Anatomy of an Options Print and Smart Money + + + +
+
+
+ Islandflow Reference · Options Flow Pipeline + +
+ +
+
+

The Anatomy of an Options Print and Smart Money

+

+ This page explains how a single options print moves through Islandflow under normal market conditions, + how the signal gate decides whether compute should care, how a parent flow packet is assembled, and how + smart-money, classifier-hit, and alert events emerge from that packet. It is designed as one artifact + with three reading depths: executive, mixed technical, and operator-level. +

+
+ +
+
+ +
+
+
+

Legend

+

Color coding is semantic, not decorative, so you can scan the diagram without relearning the vocabulary.

+
+
+ Raw market or synthetic input + Derived compute stage + Stored or persisted state + API, websocket, or user-facing surface +
+
+ +
+
+

Main Flow Chart

+

+ The first row shows the common path every print touches. The second row shows the branch between prints + that remain tape-only and prints that become packet candidates for smart-money evaluation. +

+
+ +
+
+
+
Input
+
+ Stage 1 + Option print candidate arrives +

+ The source can be a native market adapter or the synthetic adapter. Synthetic mode can also emit a + matching NBBO update. +

+
+
+ Stage 2 + ingest-options enriches the print +

+ The service joins recent option NBBO and underlying equity quote context, derives metadata, and + computes signal_pass. +

+
+
+ Stage 3 + Raw print is written and published +
    +
  • ClickHouse: option_prints
  • +
  • NATS: options.prints
  • +
+
+
+ Stage 4 + Signal gate decides if compute should care +

+ Only signal_pass=true prints are published to options.prints.signal and + consumed by compute. +

+
+
+ Stage 5 + compute builds or updates a parent cluster +

+ Nearby signal prints for the same contract are grouped inside the cluster window while NBBO and + equity-quote caches supply context. +

+
+
+ Stage 6 + API and UI consume the resulting streams +

+ The API hydrates hot snapshots, history endpoints read ClickHouse, and the terminal surfaces tape, + flow, smart-money, classifier, and alert views. +

+
+
+ +
+
Tape-only branch
+
+
+
+ Branch A + Raw print remains visible +

+ Even if the print does not pass the signal gate, it still exists in ClickHouse and can appear in + raw tape or history views. +

+
+
+ Branch A outcome + No compute packet path +

+ No FlowPacket, no smart-money evaluation, no classifier hits, and no alert emission. +

+
+
+
+
+ +
+
Smart-money branch
+
+
+
+
+ Branch B + Signal print enters compute +

+ compute subscribes to options.prints.signal, not raw options.prints. +

+
+
+ Branch B outcome + FlowPacket is emitted +
    +
  • ClickHouse: flow_packets
  • +
  • NATS: flow.packets
  • +
+
+
+ Branch B continuation + Smart-money, classifier hits, alerts +

+ The packet is scored into a SmartMoneyEvent, which may abstain, produce classifier + hits, and finally emit an alert. +

+
+
+
+
+
+ +
+
+

Executive Read

+

+ The shortest truthful version of the system: not every options print is considered meaningful, and smart + money is not detected directly from a single print. +

+
+
+
+ 1. Tape +

Every print is stored

+

+ All enriched prints are written to ClickHouse and published to the raw options subject. This preserves + evidence even when the print is uninteresting for higher-order inference. +

+
+
+ 2. Compute +

Only signal prints reach the parent-event engine

+

+ A print must pass the signal gate before compute clusters it with neighboring prints and builds a + packet that represents a possible parent order. +

+
+
+ 3. Smart money +

Smart money is a scored interpretation

+

+ The model evaluates the packet using quote quality, aggressor mix, size, structure, DTE, IV, and event + context. It can still abstain if the evidence is weak or suppressed. +

+
+
+
+ +
+
+

Mixed Technical Walkthrough

+

+ This layer is for teammates who know the product and want the exact branching logic without reading + through service code first. +

+
+
+
+

+ Step 1: a candidate print enters ingest-options. In synthetic mode this + print was manufactured by the synthetic adapter, which may also emit a synthetic NBBO update for the + same contract. +

+

+ Step 2: the print is enriched with the most recent option NBBO and underlying equity + quote at or before the print timestamp. The service derives metadata, execution-side context, and the + signal_pass decision. +

+

+ Step 3: the enriched print is persisted to ClickHouse and published to + options.prints. If signal_pass=true, the same print is also published to + options.prints.signal. +

+

+ Step 4: compute subscribes to the signal subject plus NBBO and equity-quote subjects. + It does not build packet candidates from every raw print. It only clusters signal prints. +

+

+ Step 5: compute aggregates nearby signal prints for the same option contract into a + cluster, then flushes that cluster into a FlowPacket with features such as total premium, + print count, aggressor ratios, NBBO coverage, stale-quote counts, IV context, and structure clues. +

+

+ Step 6: the packet is transformed into a SmartMoneyEvent. If suppression + rules trip or the top profile probability is too weak, the event abstains. Otherwise, it can emit + classifier hits and finally an alert with evidence references back to the packet and member prints. +

+
+ +
+
+ +
+
+

Operator and Code-Level Detail

+

+ This section is for someone tracing the live pipeline, debugging a regression, or trying to understand + exactly why a given print surfaced on tape but did or did not become a smart-money event. +

+
+
+

+ The first fork is the signal gate in ingest-options. The enriched print is always stored and + published raw. The only thing signal_pass controls is whether compute receives that print on + options.prints.signal. +

+

+ The compute service maintains separate caches for option NBBO and underlying equity quotes. When signal + prints arrive, it flushes aged clusters, extends the active cluster for that contract if the print lands + within the configured window, or emits the old cluster and starts a new one. +

+

+ The cluster becomes a FlowPacket only after compute summarizes parent-level features. That + packet then passes through smart-money scoring. The scoring layer derives a profile set such as + institutional directional, retail whale, event driven, vol seller, arbitrage, or hedge reactive. +

+

+ A packet can still fail to produce actionable downstream artifacts. Suppression rules down-rank special + print context, stale or missing quote context, and cross-like execution patterns. The top profile must + also clear the probability threshold. If it does not, the smart-money event is emitted in abstained form + and classifier hits stop there. +

+

+ If the packet does clear those checks, compute writes and publishes the smart-money event, derives up to + a few classifier hits from the top profile set, scores a final alert, and publishes all three derived + streams. The API subscribes to those subjects and fans them out into live websocket channels while + ClickHouse remains the history source behind /history/*. +

+
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Subject or tableProduced byCarriesWhy it exists
options.printsingest-optionsAll enriched option printsPreserves the full tape, even when a print is not interesting enough for compute.
options.prints.signalingest-optionsSignal-passing option printsActs as the compute admission gate so packet building starts from a filtered tape.
flow.packetscomputeParent-event candidatesTurns several child prints into one summarized event with market-structure features.
flow.smart_moneycomputeSmart-money evaluationsPublishes the scored interpretation of a packet, including abstained outcomes.
flow.classifier_hitscomputeTop classifier consequencesExposes the strongest profile-level labels that downstream UX and alerting can decorate.
flow.alertscomputeAlert events with evidence refsPackages the final severity and supporting evidence into a user-facing alert stream.
+
+
+ +
+
+

Normal Path Versus Smart-Money Path

+

+ These two sequences are easy to confuse, especially because both begin with the same enriched tape + record. +

+
+
+
+ Normal market path +

+ Print arrives, gets enriched, gets stored, appears on the raw tape, and stops there unless it passes + the signal gate. This is the dominant path for ordinary or low-signal activity. +

+
+
+ Smart-money path +

+ Print arrives, passes the signal gate, joins a cluster, becomes a packet, receives a smart-money score, + then may emit classifier hits and an alert if the packet is not suppressed or abstained. +

+
+
+
+ +
+
+

Annotated Event Sequence

+

+ The example below is the shortest operator-friendly way to think about the branch that leads to a + smart-money result. +

+
+
1. Synthetic or market adapter emits OptionPrint candidate
+2. ingest-options enriches it with latest NBBO and underlying quote context
+3. Enriched print is written to ClickHouse option_prints
+4. Enriched print is published to options.prints
+5. If signal_pass=true, the same print is also published to options.prints.signal
+6. compute consumes options.prints.signal and updates the active contract cluster
+7. Cluster flush builds a FlowPacket with parent-level features
+8. FlowPacket is written to ClickHouse flow_packets and published to flow.packets
+9. compute scores the packet into a SmartMoneyEvent
+10. If suppressed or low-confidence, the SmartMoneyEvent abstains and stops there
+11. Otherwise classifier hits are emitted
+12. Alert scoring emits a final alert with evidence refs to smart-money event, flow packet, and member prints
+13. API subscribes to these streams and exposes them through live websocket channels and ClickHouse-backed history
+
+ +
+
+

What Synthetic Mode Changes

+

+ Synthetic mode can make the upstream generator artificial, but the downstream branch logic stays + identical. +

+
+
+

+ The synthetic adapter constructs an OptionPrint with fields such as + execution_iv_source="synthetic_pressure_model", and it may emit a synthetic NBBO for the + same contract. From that point forward, the pipeline is the same one used for normal ingest. +

+

+ That means synthetic smart-money is not a special smart-money subsystem. It is the standard + signal-to-packet-to-smart-money pipeline running on synthetic upstream events. +

+
+
+ +
+
+

Code Anchors

+

+ If you want to confirm this page against the code, these are the most useful entry points. +

+
+
+
    +
  • services/ingest-options/src/enrichment.ts: enriches the print and decides signal_pass.
  • +
  • services/ingest-options/src/index.ts: writes prints and publishes raw versus signal subjects.
  • +
  • services/compute/src/index.ts: subscribes to signal prints, maintains clusters, emits packets, smart money, hits, and alerts.
  • +
  • services/compute/src/parent-events.ts: builds SmartMoneyEvent, suppression rules, primary profile, abstention, and classifier derivation.
  • +
  • packages/bus/src/subjects.ts: canonical subject names for the pipeline.
  • +
+
+ +
+
+
+ + diff --git a/docs/index.html b/docs/index.html index 211c5ac..140ee55 100644 --- a/docs/index.html +++ b/docs/index.html @@ -207,36 +207,76 @@
-
35 of 35 files shown
+
47 of 47 files shown
-
-

turns 28

+

turns 37

    -
  • - turns/2026-05-19-publish-docs-pages-index.html +
  • + turns/2026-05-22-stabilize-live-api-memory.html
    html - 6.7 KB - May 19, 2026, 2:59 PM + 26 KB + May 22, 2026, 9:47 PM
  • -
  • - turns/2026-05-18-native-public-edge-cutover.html +
  • + turns/2026-05-22-publish-standup-summary-2026-05-21.html
    html - 19 KB - May 19, 2026, 2:48 PM + 5.5 KB + May 22, 2026, 9:04 AM +
    +
  • + + +
  • + turns/2026-05-21-publish-standup-summary-2026-05-20.html +
    + html + 5.0 KB + May 21, 2026, 9:05 AM +
    +
  • + + +
  • + turns/2026-05-20-refresh-readme-github-description.html +
    + html + 7.7 KB + May 20, 2026, 9:54 PM +
    +
  • + + +
  • + turns/2026-05-20-remote-backfill-sync.html +
    + html + 4.3 KB + May 20, 2026, 9:26 PM +
    +
  • + + +
  • + turns/2026-05-20-fix-alert-flow-packet-history.html +
    + html + 14 KB + May 20, 2026, 9:26 PM
  • @@ -246,37 +286,7 @@
    html 9.8 KB - May 19, 2026, 2:48 PM -
    - - - -
  • - turns/2026-05-18-native-fast-iterative-deploy.html -
    - html - 9.0 KB - May 19, 2026, 2:48 PM -
    -
  • - - -
  • - turns/2026-05-19-0805-clarify-repo-turn-doc-rules.html -
    - html - 6.4 KB - May 19, 2026, 8:05 AM -
    -
  • - - -
  • - turns/2026-05-19-0739-update-readme-current-state.html -
    - html - 9.8 KB - May 19, 2026, 7:39 AM + May 20, 2026, 9:26 PM
  • @@ -286,7 +296,87 @@
    html 9.0 KB - May 19, 2026, 7:31 AM + May 20, 2026, 9:26 PM +
    + + + +
  • + turns/2026-05-19-harden-native-ssh-deploy-checks.html +
    + html + 7.0 KB + May 20, 2026, 9:26 PM +
    +
  • + + +
  • + turns/2026-05-19-native-options-recovery-guardrails.html +
    + html + 7.7 KB + May 20, 2026, 9:26 PM +
    +
  • + + +
  • + turns/2026-05-19-publish-docs-pages-index.html +
    + html + 6.7 KB + May 20, 2026, 9:26 PM +
    +
  • + + +
  • + turns/2026-05-19-0739-update-readme-current-state.html +
    + html + 9.8 KB + May 20, 2026, 9:26 PM +
    +
  • + + +
  • + turns/2026-05-19-0805-clarify-repo-turn-doc-rules.html +
    + html + 6.4 KB + May 20, 2026, 9:26 PM +
    +
  • + + +
  • + turns/2026-05-19-fix-native-alpaca-news.html +
    + html + 12 KB + May 20, 2026, 9:26 PM +
    +
  • + + +
  • + turns/2026-05-18-native-fast-iterative-deploy.html +
    + html + 9.0 KB + May 20, 2026, 9:26 PM +
    +
  • + + +
  • + turns/2026-05-18-native-public-edge-cutover.html +
    + html + 19 KB + May 20, 2026, 9:26 PM
  • @@ -296,7 +386,7 @@
    html 7.0 KB - May 18, 2026, 4:54 PM + May 20, 2026, 9:26 PM
    @@ -513,7 +603,7 @@
    html 16 KB - May 19, 2026, 2:55 PM + May 20, 2026, 9:26 PM
    @@ -522,15 +612,35 @@
    -

    general 2

    +

    general 4

      +
    • + general/2026-05-22-standup-summary-2026-05-21.html +
      + html + 11 KB + May 22, 2026, 9:04 AM +
      +
    • + + +
    • + general/2026-05-21-standup-summary-2026-05-20.html +
      + html + 16 KB + May 21, 2026, 9:05 AM +
      +
    • + +
    • general/2026-05-18-standup-summary-2026-05-17.html
      html 19 KB - May 18, 2026, 9:05 AM + May 20, 2026, 9:26 PM
    • @@ -557,7 +667,7 @@
      html 3.8 KB - May 19, 2026, 2:48 PM + May 20, 2026, 9:26 PM
      @@ -576,9 +686,19 @@
      -

      root 2

      +

      root 3

        +
      • + anatomy.html +
        + html + 33 KB + May 22, 2026, 10:22 PM +
        +
      • + +
      • clickhouse-reset-runbook.md
        diff --git a/docs/turns/2026-05-22-add-options-anatomy-explainer.html b/docs/turns/2026-05-22-add-options-anatomy-explainer.html new file mode 100644 index 0000000..294e0ad --- /dev/null +++ b/docs/turns/2026-05-22-add-options-anatomy-explainer.html @@ -0,0 +1,584 @@ + + + + + + Turn Record: Add Options Anatomy Explainer + + + +
        +
        + Turn Record · May 22, 2026 +

        Add Options Anatomy Explainer

        +

        + Added a standalone docs/anatomy.html reference page that explains the + full lifecycle of an options print, from ingest and signal gating through flow packet + construction, smart-money scoring, classifier hits, alerts, and API/live consumption. + The page is styled to match Islandflow’s product register and layered so exec, mixed + technical, and operator-level readers can all use the same artifact. +

        +
        +
        + Beads + islandflow-hpf +
        +
        + Artifact + docs/anatomy.html +
        +
        + Register + Product, evidence-console styling +
        +
        + Secondary Change + Regenerated docs/index.html +
        +
        +
        + +
        +
        +

        Summary

        +

        + The repo now includes a reusable explainer page for one of the most important pieces of + Islandflow’s mental model: how a raw or synthetic options print turns into visible tape, + a flow packet, and sometimes a smart-money or alert event. Instead of scattering that + explanation across chat answers and source code, the new page centralizes the pipeline in + a designed HTML document that can be browsed directly under docs/. +

        +
        + Primary outcome: the new page makes the option-print pipeline legible at + three reading depths without forcing someone to reconstruct the architecture from service + code. +
        +
        + +
        +

        Changes Made

        +
          +
        • + Added docs/anatomy.html as a standalone explainer page titled + The Anatomy of an Options Print and Smart Money. +
        • +
        • + Built a large flow-chart section that distinguishes the common tape path from the + signal-to-packet-to-smart-money branch. +
        • +
        • + Layered the page into executive, mixed technical, and operator-level explanations so + one artifact works for multiple audiences. +
        • +
        • + Included subject/table mapping, annotated sequence detail, synthetic-mode notes, and + code anchors back into the real repo. +
        • +
        • + Regenerated docs/index.html so the new explainer is discoverable from the + existing docs index. +
        • +
        +
        + +
        +

        Context

        +

        + The user asked for a true flow-chart explanation of what happens when options tape comes + in under normal market scenarios and when smart-money behavior is detected, with the + important caveat that the current environment is using synthetic prints. The repo already + had the implementation details, but not a clear product artifact that unified ingest, + compute, storage, bus subjects, and API/live consumption into one readable document. +

        +

        + Because Islandflow’s UI language is already defined as an “evidence console,” the new + page needed to feel operational and precise rather than like a generic landing page or a + decorative infographic. +

        +
        + +
        +

        Important Implementation Details

        +
        +
        +

        Information architecture

        +
          +
        • + The page starts with a semantic legend and a visual flow board so readers can build + the correct mental model before diving into prose. +
        • +
        • + The explanation then deepens in three layers: executive read, mixed technical + walkthrough, and operator/code-level detail. +
        • +
        • + The normal tape path and the smart-money path are split explicitly so readers do not + confuse raw tape visibility with compute-derived inference. +
        • +
        +
        +
        +

        Design choices

        +
          +
        • + The visual treatment follows the repo’s product register: dark, stable, evidence-first, + amber used as a sparse signal, monospace labels for pipeline semantics. +
        • +
        • + The flow chart is pure HTML and CSS, not a JavaScript diagram dependency, so the + page remains portable and straightforward to keep in sync with the repo. +
        • +
        • + docs/index.html was regenerated with the existing script so the page + participates in the current docs navigation surface instead of becoming a hidden one-off. +
        • +
        +
        +
        +
        + +
        +

        Relevant Diff Snippets

        +

        + These snippets are rendered with the Diffs library from + diffs.com, with a plain-text fallback kept inline. +

        +
        +
        +

        docs/anatomy.html: new explainer page and flow-board structure

        +
        +
        + Plain-text fallback +
        + Added docs/anatomy.html
        ++ Product-register dark evidence-console styling
        ++ Main flow chart with common path, tape-only branch, and smart-money branch
        ++ Layered explanation sections for executive, mixed technical, and operator audiences
        ++ Subject map, annotated sequence, synthetic mode notes, and code anchors
        +
        +
        + +
        +

        docs/index.html: regenerated docs surface with new entry count

        +
        +
        + Plain-text fallback +
        - 35 files shown
        ++ 47 files shown
        +- root/general counts from prior docs set
        ++ updated counts after regenerating the index, including the new anatomy explainer entry
        +
        +
        +
        +
        + +
        +

        Expected Impact for End-Users

        +
          +
        • + Teammates and operators now have a single place to understand why a print can appear on + tape without ever becoming a smart-money event. +
        • +
        • + The synthetic-print caveat is captured directly in the artifact, which should reduce + confusion when debugging or demoing the current environment. +
        • +
        • + The docs surface becomes more useful as a living product reference, not just a collection + of turn records and plans. +
        • +
        +
        + +
        +

        Validation

        +
          +
        • + Generated the new page at docs/anatomy.html and verified the title and + major sections are present. +
        • +
        • + Regenerated the docs index with + node scripts/generate-docs-index.mjs, which completed successfully and + reported 47 entries. +
        • +
        • + Confirmed the new explainer page is included in the docs surface by regenerating + docs/index.html. +
        • +
        +
        +
        + 1 + new reusable explainer page +
        +
        + 47 + docs index entries after regeneration +
        +
        + 3 + reader depth layers on the page +
        +
        +
        + +
        +

        Issues, Limitations, and Mitigations

        +
          +
        • + The page is intentionally hand-authored HTML rather than a generated diagram artifact. + That keeps it portable, but it also means future pipeline changes should update this page + manually. +
        • +
        • + The docs index regeneration reflects the full current docs/ tree, so the + visible counts changed by more than one file compared with the previously committed + index. +
        • +
        • + This validation pass verified structure and generation success, but did not include a + browser-rendered visual QA step against multiple viewport sizes. +
        • +
        +
        + +
        +

        Follow-up Work

        +
          +
        • + Add reciprocal links from more domain-specific docs such as smartmoney.md + back to docs/anatomy.html. +
        • +
        • + Consider a second reference page focused specifically on one concrete synthetic example, + from a burst of prints to the final alert payload. +
        • +
        • + If the flow-packet feature set evolves, keep the anatomy page in lockstep with those + changes so it remains a trustworthy operator reference. +
        • +
        +
        +
        + + +
        + + -- 2.49.1 From cbb0891ab76baacf22b2f5ab3ee0b998c9a33e7c Mon Sep 17 00:00:00 2001 From: dirtydishes Date: Sat, 23 May 2026 16:25:57 -0400 Subject: [PATCH 5/5] add turn doc for pr 8 reconciliation --- .beads/issues.jsonl | 2 +- docs/turns/2026-05-23-reconcile-pr-8.html | 93 +++++++++++++++++++++++ 2 files changed, 94 insertions(+), 1 deletion(-) create mode 100644 docs/turns/2026-05-23-reconcile-pr-8.html diff --git a/.beads/issues.jsonl b/.beads/issues.jsonl index f1c7f22..12572eb 100644 --- a/.beads/issues.jsonl +++ b/.beads/issues.jsonl @@ -21,7 +21,7 @@ {"_type":"issue","id":"islandflow-ayo","title":"Drop stale backlog events from live fanout","description":"Follow-up to live freshness rollout: /ws/live was still fanning out stale backlog events for freshness-gated channels, which kept tape panes in Live feed behind despite active synthetic ingest. Gate fanout and cache ingest by freshness for options/nbbo/equities/flow.","status":"closed","priority":1,"issue_type":"bug","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-04-28T21:26:39Z","created_by":"dirtydishes","updated_at":"2026-04-28T21:26:44Z","started_at":"2026-04-28T21:26:44Z","closed_at":"2026-04-28T21:26:44Z","close_reason":"Completed","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-0v6","title":"Fix tape freshness, NBBO coverage, pause controls, and filter popup","description":"Implement the tape fixes requested for synthetic options notional sizing, strict live freshness, live-mode pause/resume behavior, stronger NBBO snapshot coverage, and moving flow filters behind a popup. Includes server-side live cache changes, web terminal state/UI changes, and tests for synthetic pricing, live snapshot freshness/NBBO retention, and live pause/filter interactions.","status":"closed","priority":1,"issue_type":"task","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-04-28T21:02:52Z","created_by":"dirtydishes","updated_at":"2026-04-28T21:13:38Z","started_at":"2026-04-28T21:02:57Z","closed_at":"2026-04-28T21:13:38Z","close_reason":"Completed","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-e4r","title":"Implement smart-money flow filtering and synthetic firehose modes","description":"Implement the approved multi-surface plan for named synthetic market profiles, options raw-vs-signal filtering, live/API filter contracts, Tape page client-side flow filters, firehose-readiness improvements, tests, and README updates.","status":"closed","priority":1,"issue_type":"feature","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-04-28T20:10:49Z","created_by":"dirtydishes","updated_at":"2026-04-28T20:29:29Z","started_at":"2026-04-28T20:10:53Z","closed_at":"2026-04-28T20:29:29Z","close_reason":"Implemented synthetic market profiles, options signal-path filtering, signal-aware API/replay contracts, Tape page filters, tests, and README updates. Follow-up tracked in islandflow-biq.","dependency_count":0,"dependent_count":0,"comment_count":0} -{"_type":"issue","id":"islandflow-kgu","title":"Reconcile PR #8 branch with current main","description":"Why this issue exists and what needs to be done: user requested reconciliation for PR #8. Identify the PR #8 branch, merge/rebase with current main, resolve conflicts, validate, and push the updated branch so the PR can merge cleanly.","status":"in_progress","priority":2,"issue_type":"task","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-05-23T20:14:36Z","created_by":"dirtydishes","updated_at":"2026-05-23T20:14:39Z","started_at":"2026-05-23T20:14:39Z","dependency_count":0,"dependent_count":0,"comment_count":0} +{"_type":"issue","id":"islandflow-kgu","title":"Reconcile PR #8 branch with current main","description":"Why this issue exists and what needs to be done: user requested reconciliation for PR #8. Identify the PR #8 branch, merge/rebase with current main, resolve conflicts, validate, and push the updated branch so the PR can merge cleanly.","status":"closed","priority":2,"issue_type":"task","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-05-23T20:14:36Z","created_by":"dirtydishes","updated_at":"2026-05-23T20:24:29Z","started_at":"2026-05-23T20:14:39Z","closed_at":"2026-05-23T20:24:29Z","close_reason":"Closed","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-l9h","title":"stop persisting non-signal option prints in clickhouse","description":"Why: non-signal option prints are storage noise and should not be persisted by default.\\n\\nWhat: add OPTIONS_PERSIST_SIGNAL_ONLY env flag (default true), gate option_print inserts in ingest-options, add tests for persistence behavior, update env examples, and document one-off cleanup SQL for existing non-signal rows.","status":"closed","priority":2,"issue_type":"task","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-05-23T03:02:32Z","created_by":"dirtydishes","updated_at":"2026-05-23T03:06:34Z","started_at":"2026-05-23T03:02:35Z","closed_at":"2026-05-23T03:06:34Z","close_reason":"Closed","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-2cj","title":"Add Forgejo-first agent workflow guidance to AGENTS.md","description":"Why this issue exists and what needs to be done:\\n- The repository’s canonical home is Forgejo at git.deltaisland.io, but AGENTS.md does not currently direct agents to prefer Forgejo-specific workflows.\\n- Update AGENTS.md so agents treat Forgejo as primary and use the fj CLI for pull request workflows.\\n- Keep existing Beads and completion instructions intact while clarifying remote preference and command usage.","status":"closed","priority":2,"issue_type":"task","owner":"dishes@dpdrm.com","created_at":"2026-05-23T02:51:31Z","created_by":"dirtydishes","updated_at":"2026-05-23T02:55:42Z","closed_at":"2026-05-23T02:55:42Z","close_reason":"Closed","dependency_count":0,"dependent_count":0,"comment_count":0} {"_type":"issue","id":"islandflow-xc5","title":"One-time bidirectional git remote backfill between github and forgejo","description":"Perform a one-time sync so github and forgejo contain the same branch/tag refs and historical commits, including pre-transition github history and newer forgejo commits. Document exact commands and validation results.","status":"closed","priority":2,"issue_type":"task","assignee":"dirtydishes","owner":"dishes@dpdrm.com","created_at":"2026-05-21T01:25:05Z","created_by":"dirtydishes","updated_at":"2026-05-21T01:26:19Z","started_at":"2026-05-21T01:25:16Z","closed_at":"2026-05-21T01:26:19Z","close_reason":"Closed","dependency_count":0,"dependent_count":0,"comment_count":0} diff --git a/docs/turns/2026-05-23-reconcile-pr-8.html b/docs/turns/2026-05-23-reconcile-pr-8.html new file mode 100644 index 0000000..163b911 --- /dev/null +++ b/docs/turns/2026-05-23-reconcile-pr-8.html @@ -0,0 +1,93 @@ + + + + + + Turn Doc - Reconcile PR #8 + + + +

        Reconcile PR #8 with main

        +

        Date: 2026-05-23

        + +
        +

        Summary

        +

        Reconciled PR #8 by merging the latest main into stabilize-live-api-memory, resolving the only merge conflict, and pushing the updated branch to Forgejo.

        +
        + +
        +

        Changes Made

        +
          +
        • Checked out stabilize-live-api-memory from forgejo/stabilize-live-api-memory.
        • +
        • Merged forgejo/main into the PR branch.
        • +
        • Resolved merge conflict in .beads/issues.jsonl.
        • +
        • Closed Beads issue islandflow-kgu for this reconciliation work.
        • +
        • Pushed Beads data and git branch updates to Forgejo.
        • +
        +
        + +
        +

        Context

        +

        The user requested reconciliation for PR #8. PR #8 head branch (stabilize-live-api-memory) was behind current main, so the branch needed an integration merge to clear mergeability drift.

        +
        + +
        +

        Important Implementation Details

        +
          +
        • The only merge conflict was in Beads tracker data (.beads/issues.jsonl).
        • +
        • Conflict resolution preserved both upstream tracker updates and the in-progress reconciliation issue record.
        • +
        • No service/application source files required manual code conflict resolution in this reconciliation pass.
        • +
        +
        + +
        +

        Relevant Diff Snippets

        +

        Rendered in diffs.com-compatible unified diff style:

        +
        commit 6584f7d1545019da663ab3ec9719d06e25c5244e
        +Merge: db73700 8464287
        +Author: dirtydishes
        +
        ++ merge main into stabilize-live-api-memory to reconcile pr 8
        +
        +MM .beads/issues.jsonl
        +
        + +
        +

        Expected Impact for End-Users

        +

        End-users should not see functional UI/API changes from this reconciliation itself; the impact is operational: PR #8 can now be merged cleanly against current mainline history.

        +
        + +
        +

        Validation

        +
          +
        • Verified branch push to Forgejo succeeded.
        • +
        • Verified Beads push (bd dolt push) succeeded.
        • +
        • Verified final git state reports branch aligned with forgejo/stabilize-live-api-memory.
        • +
        +
        + +
        +

        Issues, Limitations, and Mitigations

        +
          +
        • Limitation: fj auth was initially unauthorized in this session.
        • +
        • Mitigation: Re-auth was completed by user and confirmed with successful fj pr view 8.
        • +
        • Scope note: This turn focused on branch reconciliation, not feature behavior changes.
        • +
        +
        + +
        +

        Follow-up Work

        +
          +
        • Proceed with normal review/merge flow for PR #8 in Forgejo.
        • +
        • If additional commits land on main before merge, re-run a quick reconciliation pass.
        • +
        +
        + + -- 2.49.1