CI / Validate (pull_request) Has been cancelled

Details

- Add advisory, entrypoint, and candidate scan outputs
- Capture dependency intelligence and cross-service attack surface notes

2026-05-28 05:13:36 -04:00

33 KiB

Raw Blame History

Islandflow Phase 3 Architecture & Threat Model KB

Generated for Stage 03 /piolium-deep on 2026-05-27. Evidence: README.md, package.json, services/api/src/index.ts, packages/storage/src/clickhouse.ts, services/ingest-*, packages/bus, apps/web, apps/desktop, and deployment/docker/docker-compose.yml.

Project Classification

Project Type

Web app: apps/web is a Next.js 16 UI with public pages (/, /tape, /signals, /charts, /news, /options, /replay) and Next route handlers for synthetic-admin proxying.
API / WebSocket gateway: services/api is a Bun HTTP server exposing REST history/live/replay endpoints and many WebSocket channels.
Workers / stream processors: services/ingest-options, services/ingest-equities, services/ingest-news, services/compute, services/candles, services/replay, services/refdata.
Desktop app: apps/desktop is an Electron wrapper around the hosted/local web app.
Internal libraries: packages/types, packages/storage, packages/bus, packages/config, packages/observability.
Deployment/CI tooling: Docker Compose VPS deployment, Bun scripts, Forgejo/GitHub Actions docs/workflows.

Purpose: personal-use, event-sourced market microstructure research platform that ingests external market/news feeds, normalizes/publishes events over NATS/JetStream, persists to ClickHouse/Redis, computes derived flow/smart-money artifacts, and exposes live/replay/history through REST and WebSockets.

Architecture Model

Components

Component	Key files	Role	Security relevance
Next.js web	`apps/web/app/`, `apps/web/app/api/admin/synthetic/`	UI + admin proxy	Browser input, rendering news/market data, admin proxy token forwarding
API gateway	`services/api/src/index.ts`	Bun REST/WebSocket server	Main network boundary; auth only for synthetic admin; query params to ClickHouse; WS fanout/subscription handling
Storage	`packages/storage/src/clickhouse.ts`	ClickHouse schema, insert/fetch query builders	SQL string construction, cursor pagination, record normalization
Bus	`packages/bus/src/**`	NATS/JetStream streams, subjects, KV synthetic control	Internal message integrity boundary; subject abuse/replay risks
Ingest options	`services/ingest-options/src/*`, `py/`	Alpaca ws/rest, Databento/IBKR Python sidecars, msgpack/json parsing	Untrusted third-party feed data and child-process stdout enter system
Ingest equities/news	`services/ingest-equities/src/**`, `services/ingest-news/src/index.ts`	Alpaca feed ingestion	WebSocket/REST parsing, news HTML/content propagation
Compute/candles/replay	`services/compute/src/`, `services/candles/src/`, `services/replay/src/index.ts`	Derived events and replay	Trusts NATS/ClickHouse inputs; can amplify poisoned data
Electron shell	`apps/desktop/src/main.ts`, `apps/desktop/src/security.ts`	Hosted/local app wrapper	Origin/navigation/sandbox boundary; env-controlled start URL
Infra	`deployment/docker/docker-compose.yml`	Web, API, NATS, ClickHouse, Redis	Bind addresses, unauthenticated internal services, secrets in env

Trust Boundaries

Internet/browser -> Next.js web/API: HTTP and WebSocket requests. Public API appears largely unauthenticated except synthetic admin endpoints.
Next.js admin proxy -> API synthetic admin: apps/web/app/api/admin/synthetic/shared.ts forwards Authorization: Bearer ${SYNTHETIC_ADMIN_TOKEN} to NEXT_PUBLIC_API_URL; feature gated by NEXT_PUBLIC_SYNTHETIC_ADMIN=1.
External market/news providers -> ingest workers: Alpaca REST/WS, Databento replay, IBKR bridge; data is untrusted until parsed/validated by zod/shared schemas.
Python child processes -> TypeScript ingest: Bun.spawn stdout JSON lines in Databento/IBKR adapters are untrusted local-process output and a command/argument construction boundary.
Services -> NATS/JetStream: internal event bus subjects determine which events reach compute/storage/API. No per-subject auth visible in compose (nats -js -sd /data).
Services -> ClickHouse/Redis: storage/cache boundary; query strings are manually built; Redis hot cache can affect live UI state.
Electron shell -> remote/local web app -> external links: trusted origins hardcoded; navigation guards route untrusted URLs to OS browser via shell.openExternal.
Deployment edge/proxy -> containers: Compose binds web/API to 127.0.0.1 by default and joins an external npm-shared network for reverse proxy. Security depends on edge routing and env overrides.

DFD/CFD Slices

DFD-1: Public API query params to ClickHouse history/replay

flowchart LR
  A[Browser/API client] -->|GET /history/* /replay/* /prints/* query params| B[services/api Bun server]
  B -->|zod/coerce parse limit/cursors/filters| C[storage fetch* functions]
  C -->|manual SQL string + quoteString/clampLimit| D[(ClickHouse)]
  D -->|JSONEachRow rows| B --> A

Risk: SQL injection if any string reaches query builder without quoteString; DoS via expensive ranges/large limits; data exposure because endpoints are unauthenticated.

DFD-2: WebSocket live fanout/subscription filtering

flowchart LR
  A[Browser WS client] -->|GET /ws/*; /ws/live messages| B[API websocket handler]
  B -->|LiveClientMessageSchema / subscription state| C[LiveStateManager]
  D[NATS events] --> E[API subscribers]
  E -->|filter by subscription/channel| B --> A

Risk: unauthenticated streaming of potentially valuable feed/derived data; WS resource exhaustion; subscription filter bypass or malformed message DoS.

DFD-3: External feeds to NATS/ClickHouse/UI

flowchart LR
  A[Alpaca/Databento/IBKR/news feeds] -->|WS/REST/msgpack/JSON/child stdout| B[ingest workers]
  B -->|schema parse/normalization| C[NATS subjects]
  C --> D[compute/candles]
  C --> E[storage writers]
  E --> F[(ClickHouse)]
  F --> G[API REST/WS] --> H[Web/Electron UI]

Risk: poisoned feed messages, malformed binary/JSON DoS, HTML/script content in news, bogus symbols/traces polluting derived analytics and UI.

DFD-4: Synthetic admin control

flowchart LR
  A[Browser] -->|/api/admin/synthetic/*| B[Next route handler]
  B -->|Bearer SYNTHETIC_ADMIN_TOKEN| C[API /admin/synthetic/status/control]
  C -->|writeSyntheticControlState| D[NATS KV synthetic control]
  D --> E[synthetic ingest/backend mode]

Risk: token leakage/misconfiguration; SSRF-like proxying if NEXT_PUBLIC_API_URL is attacker-controlled; admin state changes control synthetic market behavior.

flowchart LR
  A[Env ISLANDFLOW_DESKTOP_START_URL] --> B[resolveDesktopStartUrl]
  B -->|trusted origin only| C[BrowserWindow]
  C -->|will-navigate/window.open| D[Navigation guards]
  D -->|trusted: load| C
  D -->|external safe URL| E[OS browser shell.openExternal]

Risk: origin allowlist mistakes, openExternal abuse, remote content compromise; controls include sandbox, context isolation, no nodeIntegration, disabled permission requests.

CFD-1: Request routing/auth decision in API

flowchart TD
  A[Bun fetch(req)] --> B{path/method}
  B -->|/health| Z[public ok]
  B -->|/admin/synthetic/*| C[authenticateSyntheticAdminRequest]
  C -->|fail| D[401/403]
  C -->|pass| E[status/control KV]
  B -->|all market/history/replay/ws paths| F[public handler no auth]
  F --> G[parse params -> storage/WS]

Security-critical decision: only synthetic admin is protected; all other handlers rely on deployment/network exposure for access control.

CFD-2: Ingest validation/control flow

flowchart TD
  A[adapter selected by env] --> B{synthetic/alpaca/databento/ibkr}
  B --> C[external REST/WS or Bun.spawn]
  C --> D[decode JSON/msgpack/lines]
  D --> E{schema/field checks}
  E -->|valid| F[publishJson to NATS]
  E -->|invalid| G[drop/log/continue]

Security-critical decision: schema parsing and field bounds decide whether untrusted external data becomes authoritative event stream.

CFD-3: Deployment exposure

flowchart TD
  A[.env / compose vars] --> B{WEB_BIND_IP/API_BIND_IP}
  B -->|default 127.0.0.1| C[local reverse proxy boundary]
  B -->|0.0.0.0 override| D[direct public exposure]
  C --> E[external npm-shared network]
  D --> F[public unauth API/WS if firewall absent]

Security-critical decision: production auth depends heavily on bind IP/reverse proxy/firewall settings.

Attack Surface

Attacker-controlled sources

HTTP paths/query/body to services/api REST endpoints: /prints/options, /nbbo/options, /prints/equities, /prints/equities/range, /quotes/equities, /candles/equities, /joins/equities, /dark/inferred, /flow/*, /news, /history/*, /replay/*, /lookup/options-support, /*/by-*, /flow/alerts/:trace/context.
WebSocket connections/messages to /ws/options, /ws/options-nbbo, /ws/equities, /ws/equity-candles, /ws/equity-quotes, /ws/equity-joins, /ws/inferred-dark, /ws/flow, /ws/classifier-hits, /ws/smart-money, /ws/alerts, /ws/live.
Next.js route handlers /api/admin/synthetic/status and /api/admin/synthetic/control when admin feature enabled.
Market/news provider payloads from Alpaca REST/WS, Databento replay output, IBKR bridge output.
Environment variables: service URLs, bind IPs, tokens/API keys, Python binary path, adapter selection, Electron start URL.
NATS messages/KV state if any service or network peer can publish.
ClickHouse/Redis contents if storage is compromised or seeded with malicious data.
CI/deploy script inputs: branch names, PR refs, env secrets, deployment hosts.

High-value sinks

ClickHouse query execution in packages/storage/src/clickhouse.ts.
NATS publish/subscribe/KV in packages/bus/src/** and service consumers.
Redis hot cache in services/api/src/live.ts/candles.
Browser DOM rendering in apps/web, especially news content_html, headlines, URLs, explanations JSON.
Electron shell.openExternal and BrowserWindow.loadURL.
Bun.spawn in Databento/IBKR adapters and deployment scripts invoking shell/ssh/docker.
Logs/metrics containing URLs, provider errors, trace IDs, possibly secrets if not redacted.

Framework Contracts and Hidden Control Channels

Bun server routing: services/api/src/index.ts uses manual if routing. Path normalization, percent-decoding, and regex routes (/flow/packets/:id, /flow/alerts/:trace/context) are security-sensitive.
Next.js route handlers: apps/web/app/api/admin/synthetic/** are forced dynamic and proxy to the API. Security depends on feature env and server-side SYNTHETIC_ADMIN_TOKEN; NEXT_PUBLIC_API_URL is a hidden control channel for target API base.
Next.js public env: variables prefixed NEXT_PUBLIC_* are exposed to clients. Do not place secrets there. NEXT_PUBLIC_API_URL controls browser/API reachability and admin proxy target base in server code.
Proxy/bind assumptions: Compose defaults WEB_BIND_IP and API_BIND_IP to 127.0.0.1; external access likely via reverse proxy on npm-shared. If overridden to 0.0.0.0, unauthenticated API/WS become directly reachable.
Internal services unauthenticated by default: NATS, ClickHouse, Redis compose definitions do not show credentials/TLS. The Docker network is an implicit trust boundary.
Header contracts: Synthetic admin uses Authorization: Bearer; no other route-level auth headers observed. If a reverse proxy injects auth headers, handlers do not re-check them.
WebSocket contracts: Bun server.upgrade accepts based on path only; no Origin/auth check observed. /ws/live message schema is the main control.
Runtime modes: Synthetic/admin behavior depends on SYNTHETIC_CONTROL_ENABLED, SYNTHETIC_ADMIN_TOKEN, NEXT_PUBLIC_SYNTHETIC_ADMIN, adapter envs. API deliver policy and consumer reset affect stream replay behavior.
Electron contracts: Trust is origin-based (flow.deltaisland.io, 127.0.0.1:3000, localhost:3000); sandbox/contextIsolation/webSecurity are enabled; permission prompts denied; external URLs opened only when source is trusted.
Storage escaping contract: ClickHouse string safety depends on local quoteString, buildStringList, clamp*, and typed table constants. Any future query builder bypassing these helpers is high risk.

Threat Model

Assets

Alpaca/Databento/IBKR API credentials and NATS/ClickHouse/Redis URLs.
Market/news data and derived smart-money alerts/flow packets (proprietary research value).
Integrity of event stream, replay history, and classifier outputs.
Availability of live API, WS fanout, NATS JetStream, ClickHouse, Redis.
Admin synthetic-control state.
Desktop user environment (external URL opening/browser trust).
Deployment secrets and CI credentials.

Threat actors

Anonymous internet clients if web/API are exposed through reverse proxy or bind-IP override.
Malicious/compromised market data provider, websocket MITM where TLS/config is weakened, or malformed feed data.
Network peer/container on Docker shared/default networks.
Operator/local attacker who can modify env vars or Python binary paths.
Malicious webpage/content rendered in news/web UI, or compromised trusted origin in Electron.
Supply-chain attacker via npm/Bun/Python dependencies or CI workflow changes.

Abuse paths and priorities

Threat	Boundary	Impact	Likelihood	Priority	Existing controls	Review focus
Unauthenticated REST/WS data extraction or scraping	Internet -> API	Med/High	Med if exposed	High	Bind defaults to localhost	Confirm intended auth; add API auth/rate limits/Origin checks
Synthetic admin token bypass/leak/misproxy	Browser/Next -> API admin	Med	Med	High	Bearer token, feature flag	Verify `authenticateSyntheticAdminRequest`, proxy URL allowlist, no token in client bundle/logs
ClickHouse injection or expensive query DoS	HTTP params -> storage	High	Med	High	zod, clamp, `quoteString`	Custom SAST for string SQL helpers and unbounded ranges
Poisoned feed data corrupts analytics/UI	Provider -> ingest -> NATS/UI	High integrity	Med	High	schemas, field parsing	Validate schemas, size limits, HTML sanitization, anomaly handling
NATS/Redis/ClickHouse lateral abuse from network peer	Docker/shared network -> infra	High	Low/Med	High	localhost port binds for web/API only	Add service credentials/TLS/ACLs; network isolation
WebSocket resource exhaustion	Internet -> API WS	Med/High availability	Med	High	schema parse for live messages	Connection/message limits, heartbeat, per-IP quotas
Electron navigation/openExternal abuse	Web content -> desktop shell	High local user impact	Low/Med	Medium	origin allowlist, sandbox, no nodeIntegration	Verify external URL schemes, downloads, CSP
XSS via news/content or explanation rendering	Feed/API -> web DOM	High if same origin admin token/proxy	Med	High	news summary escaping fallback	Audit `dangerouslySetInnerHTML`, URL rendering, CSP
Child-process command/path misuse	Env -> Bun.spawn Python	Med/High	Low/Med	Medium	args array, script path constant	Validate `pythonBin`, avoid shell, handle stdout size
CI/deploy secret leakage or command injection	PR/env -> scripts/workflows	High	Low/Med	Medium	limited visible workflows	Audit deploy scripts and Forgejo workflow triggers

Recommended controls for later phases

Treat API/WS as public unless proven behind authenticated reverse proxy; require handler-level auth for non-public data and admin controls.
Add Origin/token checks and connection/message rate limits to WS endpoints.
Centralize ClickHouse query construction; prefer parameterized ClickHouse client support if available.
Sanitize or strip provider HTML before storage/rendering; add CSP in Next app.
Add NATS/Redis/ClickHouse credentials/ACLs/TLS or restrict network access; do not rely on Docker network trust.
Harden admin proxy with strict API base allowlist and server-only env names for secrets.

Domain Attack Research

Identified domains: HTTP/Next.js, WebSocket, Electron, NATS/JetStream message bus, ClickHouse SQL/query construction, Redis cache, external market-data ingestion/parsing (JSON/msgpack), subprocess execution, Docker/deployment/CI, browser rendering/XSS. Mode B applies (security-sensitive dependencies as consumers). Mode C applies (HTTP/WS, SQL, Redis, message queues, Electron, parsing, subprocess, containers/CI). Mode A is not primary because Islandflow is not distributed as a public library/protocol, though internal package API sharp edges matter.

Domain: HTTP API / Next.js / Bun routing

Identified via: services/api manual HTTP routing, apps/web Next.js app and route handlers, Next advisory history.

Attack	Description	Detection strategy	Relevance
Auth bypass / missing handler auth	Public routes unintentionally expose data/control	Find route handlers without auth checks; diff public route inventory	High
Path/matcher confusion	Encoded paths/trailing slashes bypass manual checks/proxy rules	Test encoded path variants and reverse proxy rewrites	Med
SSRF/open proxy via admin proxy	Server fetches attacker-controlled base/path	Track `new URL(path, NEXT_PUBLIC_API_URL)` and env controls	Med
Cache poisoning	Host/forwarded headers or Next caching leak dynamic data	Review caching headers, `dynamic`, reverse proxy config	Low/Med

Custom SAST targets: route handlers in services/api/src/index.ts and apps/web/app/api/** lacking auth; fetch(new URL(... env ...)); use of req.headers/Host/X-Forwarded-*; public route changes. Manual checklist: confirm intended public endpoints; fuzz paths; enforce auth and rate limits. Research sources: advisory summary, wooyun-legacy web methodology, last30days/web-search class knowledge.

Domain: WebSocket

Attack	Description	Detection strategy	Relevance
Unauthenticated data streaming	Any client subscribes to feed/alerts	Enumerate `/ws/*` upgrades without auth/origin checks	High
Resource exhaustion	Many connections/messages or huge frames	Look for max payload, conn limits, heartbeat	High
Subscription filter abuse	Malformed filters cause broad fanout or CPU use	Validate `LiveClientMessageSchema`, filter matching paths	Med

Custom SAST: serverRef.upgrade, websocket.message, JSON.parse, zod parse error loops, broadcast loops. Manual: origin/auth tests; slow-client behavior; payload size tests.

Domain: ClickHouse SQL / query construction

Attack	Description	Detection strategy	Relevance
SQL injection	Manual string interpolation misses escaping	Taint HTTP params to `client.query({query})`; require `quoteString/clamp*`	High
Query DoS	wide time ranges/high cardinality IN/LIKE/position	Find unbounded arrays/ranges and expensive predicates	High
Data exfiltration	unauth history/replay endpoints dump proprietary data	Route inventory + auth absence	High

Custom SAST: RemoteFlowSource query params/body -> query: template literals in packages/storage; array length to IN/OR predicates; limits > configured max. Manual: test quotes/unicode/null bytes; verify max IDs and ranges.

Domain: NATS/JetStream message bus

Attack	Description	Detection strategy	Relevance
Subject spoofing	Network peer publishes fake market/admin events	Review connect options, credentials, subject ACLs	High
Replay/consumer confusion	Durable policy reset replays stale data as live	Trace `API_DELIVER_POLICY`, replay service controls	Med
KV control tampering	Synthetic control state modified by unauthorized peer	Review KV bucket ACL and admin endpoints	High

Custom SAST: publishJson, subscribeJson, writeSyntheticControlState, unvalidated payloads. Manual: verify NATS auth/TLS in prod, subject permissions, event schemas.

Domain: External feed parsing (JSON/msgpack/news HTML)

Attack	Description	Detection strategy	Relevance
Parser/resource DoS	Large JSON/msgpack/websocket frames exhaust memory/CPU	Locate decode/JSON.parse without size/time bounds	High
Schema confusion	Partial provider payload becomes valid incorrect event	Compare zod schemas and adapter field defaults	Med
Stored XSS via news HTML	Provider `content` stored/rendered as HTML	Trace `content_html` to React render sinks	High

Custom SAST: decode, JSON.parse, new TextDecoder, content_html, dangerouslySetInnerHTML, URLs. Manual: malformed provider fixtures; max message sizes; sanitize HTML.

Domain: Electron desktop

Attack	Description	Detection strategy	Relevance
Navigation escape	Untrusted page loaded in privileged shell	Check `loadURL`, origin allowlists, redirects	Med
openExternal abuse	Custom schemes/file URLs launched	Verify only http/https external URLs	Med
Node integration/IPC abuse	Web content gains local code exec	Check BrowserWindow preferences/preload/IPC	Low currently

Custom SAST: shell.openExternal, loadURL, setWindowOpenHandler, will-navigate, BrowserWindow prefs. Manual: redirect chains, punycode/origin tests, CSP/download handling.

Domain: Redis/cache

Attack	Description	Detection strategy	Relevance
Cache poisoning	Malicious internal publisher/data seeds hot live state	Trace key construction and schema validation	Med
Availability DoS	huge values/keys or no TTL memory growth	Review `set`/`lpush`/TTL use	Med
Unauthorized access	Redis default no password in compose	Deployment config review	High internal

Custom SAST: Redis key builders with attacker input, missing TTL, JSON.parse of cache values.

Domain: Subprocess / Python sidecars

Attack	Description	Detection strategy	Relevance
Command injection/path hijack	Env-controlled binary/args execute attacker program	Ensure no shell; validate `pythonBin`; constant script paths	Med
stdout parsing DoS	Child emits unbounded line/JSON	Limit line length and restart loops	Med
Secret leakage	API keys in args/env/logs	Review spawned args and stderr logging	Low/Med

Custom SAST: Bun.spawn, env-derived args, stderr: inherit, readLines buffer growth.

Domain: Docker/deployment/CI supply chain

Attack	Description	Detection strategy	Relevance
Insecure bind/exposure	API/NATS/ClickHouse/Redis reachable publicly	Parse compose ports/networks/env overrides	High
Secret leakage in deploy scripts	Tokens printed or sent to PR contexts	Review workflow triggers/scripts	Med
Dependency takeover/CVE	npm/Python base images/deps vulnerable	Dependency and image scanning	Med

Custom SAST: workflows with untrusted PR + secrets, deploy scripts shell interpolation, Docker ports to 0.0.0.0, no auth configs.

Phase 4 CodeQL Extraction Targets

Slice	Source type	Source	Sink kind	Sink
DFD-1 API params -> ClickHouse	RemoteFlowSource	URL search params/path/body in `services/api/src/index.ts`	sql-execution	`client.query({ query })` in `packages/storage/src/clickhouse.ts`
DFD-2 WS messages -> subscriptions/fanout	RemoteFlowSource	WebSocket `message`, path upgrade	deserialization / resource exhaustion	`LiveClientMessageSchema.parse`, JSON parse, broadcast/send loops
DFD-3 feeds -> NATS/storage/UI	RemoteFlowSource	WebSocket/REST provider messages, child stdout	deserialization / code/data injection	`JSON.parse`, msgpack `decode`, `publishJson`, `content_html` render sinks
DFD-4 admin proxy/control	RemoteFlowSource + EnvironmentVariable	Next request body; `NEXT_PUBLIC_API_URL`, `SYNTHETIC_ADMIN_TOKEN`	http-request / authz decision	`fetch(url.toString())`, `writeSyntheticControlState`
DFD-5 Electron navigation	EnvironmentVariable + RemoteFlowSource	`ISLANDFLOW_DESKTOP_START_URL`, page navigation/window.open URL	http-request / code-execution-adjacent	`BrowserWindow.loadURL`, `shell.openExternal`
Python sidecars	EnvironmentVariable	`DATABENTO_PYTHON_BIN`/`IBKR_*` env args	command-execution	`Bun.spawn`
Redis live state	RemoteFlowSource	NATS events, API filters	cache/data poisoning	Redis client methods, JSON cache serialization

Spec Gap Candidates

No formal RFC/spec commitments are declared. De facto contracts to check in Phase 9:

HTTP/1.1 and WebSocket behavior (Bun server, ws clients).
OCC option symbol parsing and market-data provider contracts (Alpaca, Databento, IBKR).
NATS/JetStream subject and durable consumer semantics.
ClickHouse SQL escaping/string literal semantics.
Electron security model for sandbox/context isolation/navigation.

Coverage Gaps

Production reverse proxy configuration is not present; API exposure/auth assumptions must be validated from deployment host.
Full services/api/src/index.ts is large; later phases should extract route inventory mechanically and test every route.
UI rendering sinks (apps/web/app/**) require deeper review for dangerouslySetInnerHTML, external links, and CSP.
NATS/ClickHouse/Redis production credentials/TLS/ACLs are not visible in compose; if configured outside repo, collect them.
Rate limiting is not apparent for REST/WS; availability risk remains unquantified.
CI canonical path in README references .forgejo/workflows, while .github/workflows also exists; audit both.
Domain research used repository/advisory evidence and built-in playbook knowledge; live web/MCP research was not available in this runtime.

Static Analysis Summary

Stage 04 prioritized piolium/attack-surface/candidates-summary.md and candidates.jsonl, especially high-score hidden-control-channel, WebSocket, SQL/query, SSRF, and unsafe HTML candidates. codeql and semgrep were checked before scanning but were unavailable on PATH, so the run used the required fallback (grep + read) rather than fabricated scan results. Semgrep Pro could not be executed because the CLI was missing; the fallback reason is documented here, and transient piolium/semgrep-res/ was removed during cleanup.

Artifacts produced:

piolium/attack-surface/source-sink-flows-all-severities.md
Structural fallback JSON/SARIF under piolium/codeql-artifacts/
Custom placeholders/rules under piolium/codeql-queries/ and piolium/semgrep-rules/
Draft findings: p4-001, p4-002, p4-003 (cap 30 respected)

Built-in CodeQL suites run: none (codeql unavailable). Built-in Semgrep rulesets run: none (semgrep unavailable). Custom Semgrep rule file was authored but not executed by Semgrep; manual grep/read validation matched the risky instances.

CodeQL Structural Analysis

CodeQL database build/extraction was skipped because the codeql binary was not installed on PATH. Fallback structural extraction still populated the mandatory files for downstream phases:

Entry points: 7 (piolium/codeql-artifacts/entry-points.json)
Sinks: 8 (piolium/codeql-artifacts/sinks.json)
Reachable slices: 5 of 7 (piolium/codeql-artifacts/call-graph-slices.json)

Machine-Generated DFD Diagram

flowchart LR
  A[HTTP req/query params] --> B[services/api routes]
  B --> C[ClickHouse query sinks]
  W[WS upgrade/message] --> X[JSON.parse + Zod]
  X --> Y[live subscriptions/socket.send]
  N[Provider news content_html] --> S[regex sanitizeNewsHtml]
  S --> H[dangerouslySetInnerHTML]
  P[Next admin proxy routes/env] --> F[fetch API base]
  E[Env Python bin/args] --> R[Bun.spawn]
  D[Electron navigation] -. no path in fallback .-> Z[loadURL/openExternal]

Machine-Generated CFD Diagram

flowchart TD
  Q[Request arrives] --> R{Admin route?}
  R -- yes --> T{Synthetic enabled + token matches?}
  T -- pass --> U[writeSyntheticControlState]
  T -- fail --> V[401/404/409]
  R -- no/data route --> K[No app auth]
  K --> L[ClickHouse fetch JSON]
  W[WS upgrade] --> O{Origin/auth checked?}
  O -- no --> P[Accept socket/fanout]
  N[News HTML] --> G{Regex sanitizer passes?}
  G -- yes --> H[Render HTML]

Notable entry points not fully represented in Phase 3 DFD slices: client-side window.location.host API/WS selection and response content-type robustness checks. Notable sinks mapping to high-risk flows: dangerouslySetInnerHTML, WebSocket socket.send, and ClickHouse client.query.

SAST Enrichment

Finding	Classification	Attacker Control	Boundary	CodeQL Reachability	Verdict
p4-001 stored-xss-news-html-regex-sanitizer	security	upstream news provider / bus publisher controls `content_html`	external feed -> browser DOM	reachable (fallback slice DFD-3)	keep
p4-002 unauthenticated-websocket-market-data-streams	security	remote client controls WS upgrade/messages	internet/proxy -> API live streams	reachable (fallback slice DFD-2)	keep
p4-003 public-api-exposes-queryable-market-history	security	remote client controls HTTP params if API exposed	internet/proxy -> ClickHouse-backed data API	reachable (fallback slice DFD-1)	keep
admin-proxy-env-base-url-fetch	env/tooling/admin-only	deployment env controls `NEXT_PUBLIC_API_URL`; route path fixed	server env -> outbound fetch	reachable (fallback slice DFD-4)	drop as draft; monitor config
Python sidecar Bun.spawn	env/tooling/admin-only	env/config controls python binary/args	local service config -> subprocess	reachable (fallback Python sidecars)	drop
test secret literals	correctness/env	source-controlled tests	none	no-slice	drop
static redirects	correctness	no user-controlled URL	none	no-slice	drop

Spec Gap Analysis

Gap: Root Docker Compose publishes unauthenticated ClickHouse, Redis, and NATS control planes

Contract: Docker deployment/internal-service contract for infrastructure dependencies (ClickHouse, Redis, NATS/JetStream) should keep data/control planes internal unless credentials/TLS/ACLs are configured.
Security Assumption: Application services treat ClickHouse, Redis, and NATS as trusted internal dependencies; API-layer validation/auth is not re-applied to direct database, cache, or message-bus clients.
Code Path: docker-compose.yml:1 — root compose publishes infrastructure ports; deployment/docker/docker-compose.yml:120 — production compose keeps those services internal-only by omitting host ports.
Gap Type: framework-contract | hidden-control-channel | proxy-trust | runtime-mode
Attack Vector: A network attacker reaches the host-published service ports, publishes forged NATS messages, tampers with Redis state, or queries/modifies ClickHouse directly.
Exploit Conditions: Root compose is used on a network-reachable host and host firewall does not block 8123, 9000, 6379, 4222, or 8222.
Impact: Data confidentiality/integrity compromise and bypass of API-layer controls for market history, live state, and event streams.
Severity: HIGH
Evidence: Root compose maps 8123:8123, 9000:9000, 6379:6379, 4222:4222, and 8222:8222; production compose defines the same services without host ports.

Authorization Audit

Public routes matrix: piolium/attack-surface/public-routes-authz-matrix.md
Public/network operations reviewed: 17 matrix rows covering API REST groups, API WebSocket groups, Next public pages, and Next synthetic-admin proxy routes.
Frameworks covered: Bun manual routing/WebSocket upgrade, Next.js route handlers/file routes.
Middleware/proxy-derived identity reviewed: backend synthetic bearer token, x-synthetic-admin-token, Next admin proxy token injection, bind/reverse-proxy exposure assumptions, WebSocket path-only upgrades.
Drafts filed: 1 (authz-missing-guard): piolium/findings-draft/p5-001-public-next-admin-proxy-confers-synthetic-admin.md.
Remaining review targets: unauthenticated market-data REST/history/replay/WebSocket surfaces are currently treated as intended-public/read-only, but should be chamber-reviewed against product policy because exposure depends on reverse proxy/bind settings and data may have proprietary value.

State & Concurrency Audit

State-holding entities catalogued: 8
Concurrency primitives observed: JetStream manual ack/explicit ack; NATS KV for synthetic control. No language locks, DB transactions, SELECT FOR UPDATE, advisory locks, or Redis/distributed locks observed.
Idempotency infrastructure: partial/in-memory only (recentStructureEmits, live/UI dedupe); no durable processed-event/idempotency store for JetStream consumers.
Drafts filed: 2 (idempotency: 1, stale-read: 1)

Cross-Service Taint Propagation

Services analysed: 8
Edges stitched: 15 (1 http, 0 grpc, 13 queue, 1 db-write, 0 file)
Coverage gaps: provider-only HTTP calls excluded; raw options.prints has no in-repo consumer identified; NATS subject identity depends on deployment controls — see piolium/attack-surface/cross-service-edges.md
Drafts filed: 1 (queue-source-auth: 1)

33 KiB Raw Blame History