Server Load Tuning
Completed on 2026-05-22 02:10:10 EDT on branch server-load.
Summary
Reduced the configured causes of steady server load by stretching Docker healthcheck intervals to 300 seconds across the active external stacks, slowing Islandflow API Redis live-cache flush defaults, and disabling append-only persistence in the checked-in Islandflow Redis configs because this Redis usage is cache-oriented.
Changes Made
- Changed Islandflow API live-cache flush defaults from
250ms / 100 updatesto1000ms / 500 updatesinservices/api/src/live.ts. - Updated the documented and example environment values in
.env.example,deployment/docker/.env.example, andREADME.md. - Disabled append-only persistence in checked-in Islandflow Redis deploy configs:
deployment/native/config/redis.confanddeployment/docker/docker-compose.yml. - Changed external Docker healthcheck intervals to
300sin/home/delta/apps/freedomtracker/deployment/docker/compose.prod.yml,/home/delta/apps/drucquerdotcom/compose.prod.yml, and/home/delta/forgejo/docker-compose.yml. - Added an explicit
300shealthcheck override for/home/delta/netdata/docker-compose.ymlto replace the more frequent image default.
Context
Observed runtime behavior showed high containerd and dockerd CPU with a constant stream of Docker exec_create, exec_start, and exec_die events generated by healthchecks. The host Redis instance was also hot, with roughly 19k ops/sec and command stats dominated by LPUSH and LTRIM from the API live-cache rewrite path.
Important Implementation Details
- The API live-state manager rewrites Redis lists on flush, so increasing the flush interval and burst size reduces write amplification without changing the live data model.
- The checked-in Redis persistence change keeps RDB snapshots but drops AOF rewrite overhead, which fits this repository's cache-heavy Redis use described in the README.
- The external stack edits were applied directly in sibling directories.
freedomtrackeranddrucquerdotcomare Git repos;forgejoandnetdataare plain directories here. - No service restarts or compose redeploys were run in this turn, so the running system will not pick up these config changes until those stacks are restarted.
Relevant Diff Snippets
services/api/src/live.ts
const DEFAULT_REDIS_FLUSH_INTERVAL_MS = 1000;
const DEFAULT_REDIS_FLUSH_MAX_ITEMS = 500;
deployment/native/config/redis.conf
appendonly no
/home/delta/apps/freedomtracker/deployment/docker/compose.prod.yml
healthcheck:
interval: 300s
/home/delta/netdata/docker-compose.yml
healthcheck:
test: ["CMD-SHELL", "/usr/sbin/health.sh"]
interval: 300s
Expected Impact for End-Users
- Lower steady CPU consumption from Docker runtime housekeeping once the external stacks are reloaded.
- Lower Redis CPU and persistence churn once the updated Islandflow API code and Redis config are deployed.
- Slightly slower healthcheck-based failure detection because stack healthchecks now run every five minutes instead of every 10-30 seconds.
Validation
docker compose -f deployment/docker/docker-compose.yml configpassed.docker compose -f deployment/docker/compose.prod.yml configpassed in/home/delta/apps/freedomtracker.docker compose -f compose.prod.yml configpassed in/home/delta/apps/drucquerdotcom.docker compose -f docker-compose.yml configpassed in/home/delta/forgejo.docker compose -f docker-compose.yml configpassed in/home/delta/netdata.bun test services/api/tests/live.test.tsfailed on an existing hot-head-size expectation unrelated to this tuning work; follow-up issueislandflow-6ubwas created to track it.
Issues, Limitations, and Mitigations
- The load-reduction changes are configured but not live until the affected services are restarted or redeployed.
- The two non-repo external stack directories (
/home/delta/forgejoand/home/delta/netdata) do not have local Git metadata in this workspace, so their edits exist only as working-tree file changes here. - Reducing Redis durability by disabling AOF increases cache loss on restart, but the repository already describes Redis as backing rolling stats and caches rather than primary system-of-record data.
Follow-up Work
- Investigate and fix
islandflow-6ub, the currentservices/api/tests/live.test.tshot-head expectation failure. - Restart or redeploy the Islandflow API/Redis runtime and the edited Docker stacks so these tuning changes take effect.
- Re-measure
containerd,dockerd, and hostredis-serverCPU after rollout to confirm the expected drop.