Islandflow Turn Document
Native Public Edge Cutover
Completed the VPS native-first cutover for Islandflow infrastructure and app services while keeping Nginx
Proxy Manager as the outer edge and Docker as the rollback path. The final state now serves
flow.deltaisland.io and api.flow.deltaisland.io from the native web and API
processes, with verified public routing and a documented follow-up for the long-term API Cloudflare posture.
Summary
The repository now contains the native infra units, native cutover scripts, Docker fallback adjustments, and
public-edge retargeting logic required to run Islandflow natively on the VPS. During validation, the live NPM
edge was switched from Docker container-name upstreams to native host ports, the host firewall was adjusted so
the NPM bridge could reach the native API, and the separate public API TLS problem was resolved by correcting
the Cloudflare DNS state for api.flow.deltaisland.io.
Changes Made
-
Added checked-in native infra operations under
deployment/native/, including
bootstrap-infra.sh, check-native-infra.sh, cutover.sh,
full-rollback.sh, start-infra.sh, and the native system units for NATS, Redis,
and ClickHouse.
-
Extended native app runtime units so the web and API bind on host-reachable interfaces, and forced the
native options ingest service to use the synthetic adapter during the cutover.
-
Updated
services/api to support explicit host binding through API_HOST, and fixed
JetStream retention conversion in packages/bus so native services can start cleanly with the
configured max-age values.
-
Updated the Docker fallback assets to publish loopback web/API ports, share durable host data under
/var/lib/islandflow, and document the native-to-Docker rollback path.
-
Reworked
deployment/native/switch-npm-edge.sh so it targets the NPM bridge gateway IP instead
of host.docker.internal, handles the root-owned NPM SQLite database, synchronizes generated
proxy_host configs, and reloads NPM deterministically after the edge switch.
-
Created Beads follow-up issue
islandflow-fl5 for the remaining decision about whether
api.flow.deltaisland.io should remain DNS-only or be re-proxied through Cloudflare.
Context
The migration started from a Docker-owned production baseline where NATS, Redis, ClickHouse, API, workers, and
web all ran in Compose, while NPM routed Islandflow traffic to Docker service names. That setup blocked a safe
native cutover for two reasons: the native services could not reach Docker-only infra reliably, and NPM could
not send public traffic to host-native processes without a deliberate upstream retarget.
The runtime model for this work is exclusive ownership. Native and Docker are not allowed to run the same API
or worker scopes in parallel because JetStream durable consumers would conflict. The objective was therefore a
phased handoff, not a mixed soak for the same queues.
Important Implementation Details
NPM edge targeting
NPM generates proxy_pass from a runtime-resolved $server variable, so the
Docker /etc/hosts alias for host.docker.internal was not sufficient. The switch
helper now detects the NPM bridge gateway and uses that IP for native upstreams.
Firewall path
The host UFW policy already allowed port 3000 but not 4000. The live fix was a
source-scoped allow for the NPM bridge subnet so the containerized edge could reach the native API.
Cloudflare API hostname
The API hostname failure was separate from the native cutover. The hostname is now a DNS-only
A record pointing at the VPS, which restored public TLS and health responses.
| Area |
Implementation detail |
| Native API |
services/api/src/index.ts now accepts API_HOST and passes it to
Bun.serve. The native unit sets API_HOST=0.0.0.0 and
API_PORT=4000.
|
| Native web |
The native web unit now starts from apps/web with
bun x next start -H "$WEB_HOST" -p "$WEB_PORT", avoiding the earlier repo-root startup
failure and binding the service on 0.0.0.0:3000.
|
| JetStream retention |
Native startup exposed a retention-unit bug. The shared bus layer now converts stream max-age values with
nanos(...) and formats them back with millis(...).
|
| Docker fallback |
Docker Compose now uses ISLANDFLOW_DATA_ROOT=/var/lib/islandflow, publishes loopback
ports, and keeps the fallback runtime compatible with the same durable data directories as the native
services.
|
| NPM switch helper |
The helper now updates both the NPM database and the generated
/data/nginx/proxy_host/*.conf files, because a DB-only restart did not reliably rewrite the
live configs for Islandflow.
|
sudo ufw allow proto tcp from 172.18.0.0/16 to any port 4000 comment 'npm bridge to native api'
Expected Impact for End-Users
-
Public web and API traffic now reaches the native Islandflow services, which removes Docker from the primary
live request path while keeping the outer edge unchanged.
-
Same-origin public API routes such as
/prints, /history, /replay,
/nbbo, and /ws/live continue to resolve correctly through the main app hostname.
-
Rollback remains fast and explicit: NPM can be pointed back at Docker service names and the Docker runtime
can reclaim the same durable data directories if native operation needs to be abandoned.
Validation
Static checks
bun run check:docker-workspace
docker compose -f deployment/docker/docker-compose.yml config --quiet
docker compose -f /home/delta/nginx-proxy-manager/docker-compose.yml config --quiet
bash -n deployment/native/*.sh
systemd-analyze verify deployment/native/systemd/user/*.service deployment/native/systemd/system/*.service
bun build services/api/src/index.ts --target=bun
bun build scripts/deploy.ts --target=bun
Native runtime
./deployment/native/check-native-health.sh full
curl http://127.0.0.1:4000/health
curl -I http://127.0.0.1:3000/
Public edge
curl -I -fksS https://flow.deltaisland.io
curl -fksS https://api.flow.deltaisland.io/health
bun run scripts/check-public-api-routes.ts https://flow.deltaisland.io
Issues, Limitations, and Mitigations
-
The native ingest-options service required an explicit synthetic-adapter override because the environment file
still pointed at an Alpaca adapter that was returning
401 responses. The service now starts
cleanly for native cutover, but production adapter selection remains an operational decision.
-
The NPM helper still relies on direct config synchronization because NPM did not reliably regenerate the
Islandflow proxy files from SQLite changes alone. This is mitigated by keeping the synchronization logic
checked in and by reloading NPM as part of the helper itself.
-
The final public API recovery currently leaves
api.flow.deltaisland.io as a DNS-only hostname.
That restored service, but it changes the edge posture relative to the web hostname and should be reviewed
deliberately.
-
A temporary Cloudflare API token was used to inspect and correct zone state during validation. That token
should be rotated outside this repository workflow.
Follow-up Work
-
islandflow-fl5: decide whether api.flow.deltaisland.io should remain DNS-only or
be re-proxied through Cloudflare, then re-validate TLS, websocket, and operational behavior for the chosen
posture.
-
After operational soak, decide whether native should become the default production runtime or remain a
supported alternative with Docker as the preferred steady-state runtime.