islandflow/docs/turns/2026-05-22-server-load-tuning.html

154 lines
7.4 KiB
HTML

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>2026-05-22 Server Load Tuning</title>
<style>
:root {
color-scheme: light dark;
--bg: #0b1020;
--panel: #121933;
--text: #e8ecf8;
--muted: #aab4d6;
--accent: #7cc4ff;
--border: #2b355f;
--code: #0f1530;
}
body {
margin: 0;
font-family: Inter, ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif;
background: var(--bg);
color: var(--text);
line-height: 1.5;
}
main {
max-width: 980px;
margin: 0 auto;
padding: 32px 20px 64px;
}
h1, h2 {
line-height: 1.2;
}
section {
background: var(--panel);
border: 1px solid var(--border);
border-radius: 14px;
padding: 20px;
margin: 18px 0;
}
p, li {
color: var(--text);
}
.muted {
color: var(--muted);
}
code, pre {
font-family: "SFMono-Regular", ui-monospace, SFMono-Regular, Menlo, Consolas, monospace;
}
pre {
background: var(--code);
border: 1px solid var(--border);
border-radius: 10px;
padding: 14px;
overflow-x: auto;
white-space: pre-wrap;
}
ul {
padding-left: 20px;
}
</style>
</head>
<body>
<main>
<h1>Server Load Tuning</h1>
<p class="muted">Completed on 2026-05-22 02:10:10 EDT on branch <code>server-load</code>.</p>
<section>
<h2>Summary</h2>
<p>Reduced the configured causes of steady server load by stretching Docker healthcheck intervals to 300 seconds across the active external stacks, slowing Islandflow API Redis live-cache flush defaults, and disabling append-only persistence in the checked-in Islandflow Redis configs because this Redis usage is cache-oriented.</p>
</section>
<section>
<h2>Changes Made</h2>
<ul>
<li>Changed Islandflow API live-cache flush defaults from <code>250ms / 100 updates</code> to <code>1000ms / 500 updates</code> in <code>services/api/src/live.ts</code>.</li>
<li>Updated the documented and example environment values in <code>.env.example</code>, <code>deployment/docker/.env.example</code>, and <code>README.md</code>.</li>
<li>Disabled append-only persistence in checked-in Islandflow Redis deploy configs: <code>deployment/native/config/redis.conf</code> and <code>deployment/docker/docker-compose.yml</code>.</li>
<li>Changed external Docker healthcheck intervals to <code>300s</code> in <code>/home/delta/apps/freedomtracker/deployment/docker/compose.prod.yml</code>, <code>/home/delta/apps/drucquerdotcom/compose.prod.yml</code>, and <code>/home/delta/forgejo/docker-compose.yml</code>.</li>
<li>Added an explicit <code>300s</code> healthcheck override for <code>/home/delta/netdata/docker-compose.yml</code> to replace the more frequent image default.</li>
</ul>
</section>
<section>
<h2>Context</h2>
<p>Observed runtime behavior showed high <code>containerd</code> and <code>dockerd</code> CPU with a constant stream of Docker <code>exec_create</code>, <code>exec_start</code>, and <code>exec_die</code> events generated by healthchecks. The host Redis instance was also hot, with roughly 19k ops/sec and command stats dominated by <code>LPUSH</code> and <code>LTRIM</code> from the API live-cache rewrite path.</p>
</section>
<section>
<h2>Important Implementation Details</h2>
<ul>
<li>The API live-state manager rewrites Redis lists on flush, so increasing the flush interval and burst size reduces write amplification without changing the live data model.</li>
<li>The checked-in Redis persistence change keeps RDB snapshots but drops AOF rewrite overhead, which fits this repository's cache-heavy Redis use described in the README.</li>
<li>The external stack edits were applied directly in sibling directories. <code>freedomtracker</code> and <code>drucquerdotcom</code> are Git repos; <code>forgejo</code> and <code>netdata</code> are plain directories here.</li>
<li>No service restarts or compose redeploys were run in this turn, so the running system will not pick up these config changes until those stacks are restarted.</li>
</ul>
</section>
<section>
<h2>Relevant Diff Snippets</h2>
<pre><code>services/api/src/live.ts
const DEFAULT_REDIS_FLUSH_INTERVAL_MS = 1000;
const DEFAULT_REDIS_FLUSH_MAX_ITEMS = 500;</code></pre>
<pre><code>deployment/native/config/redis.conf
appendonly no</code></pre>
<pre><code>/home/delta/apps/freedomtracker/deployment/docker/compose.prod.yml
healthcheck:
interval: 300s</code></pre>
<pre><code>/home/delta/netdata/docker-compose.yml
healthcheck:
test: ["CMD-SHELL", "/usr/sbin/health.sh"]
interval: 300s</code></pre>
</section>
<section>
<h2>Expected Impact for End-Users</h2>
<ul>
<li>Lower steady CPU consumption from Docker runtime housekeeping once the external stacks are reloaded.</li>
<li>Lower Redis CPU and persistence churn once the updated Islandflow API code and Redis config are deployed.</li>
<li>Slightly slower healthcheck-based failure detection because stack healthchecks now run every five minutes instead of every 10-30 seconds.</li>
</ul>
</section>
<section>
<h2>Validation</h2>
<ul>
<li><code>docker compose -f deployment/docker/docker-compose.yml config</code> passed.</li>
<li><code>docker compose -f deployment/docker/compose.prod.yml config</code> passed in <code>/home/delta/apps/freedomtracker</code>.</li>
<li><code>docker compose -f compose.prod.yml config</code> passed in <code>/home/delta/apps/drucquerdotcom</code>.</li>
<li><code>docker compose -f docker-compose.yml config</code> passed in <code>/home/delta/forgejo</code>.</li>
<li><code>docker compose -f docker-compose.yml config</code> passed in <code>/home/delta/netdata</code>.</li>
<li><code>bun test services/api/tests/live.test.ts</code> failed on an existing hot-head-size expectation unrelated to this tuning work; follow-up issue <code>islandflow-6ub</code> was created to track it.</li>
</ul>
</section>
<section>
<h2>Issues, Limitations, and Mitigations</h2>
<ul>
<li>The load-reduction changes are configured but not live until the affected services are restarted or redeployed.</li>
<li>The two non-repo external stack directories (<code>/home/delta/forgejo</code> and <code>/home/delta/netdata</code>) do not have local Git metadata in this workspace, so their edits exist only as working-tree file changes here.</li>
<li>Reducing Redis durability by disabling AOF increases cache loss on restart, but the repository already describes Redis as backing rolling stats and caches rather than primary system-of-record data.</li>
</ul>
</section>
<section>
<h2>Follow-up Work</h2>
<ul>
<li>Investigate and fix <code>islandflow-6ub</code>, the current <code>services/api/tests/live.test.ts</code> hot-head expectation failure.</li>
<li>Restart or redeploy the Islandflow API/Redis runtime and the edited Docker stacks so these tuning changes take effect.</li>
<li>Re-measure <code>containerd</code>, <code>dockerd</code>, and host <code>redis-server</code> CPU after rollout to confirm the expected drop.</li>
</ul>
</section>
</main>
</body>
</html>