Skip to content

fix: harden SSR server with keepalive fix, error handling, and health check#13

Merged
jeanpaulsio merged 1 commit intomainfrom
fix/ssr-keepalive-cloudflare-502
Apr 18, 2026
Merged

fix: harden SSR server with keepalive fix, error handling, and health check#13
jeanpaulsio merged 1 commit intomainfrom
fix/ssr-keepalive-cloudflare-502

Conversation

@jeanpaulsio
Copy link
Copy Markdown
Owner

Summary

Ports the SSR hardening from step-thru (PR #125) to this template.

Root cause of overnight 502s on Render: Node's default `keepAliveTimeout` (5s) is shorter than Cloudflare's idle timeout (~100-900s). After 5s idle, Node closes the TCP connection. Cloudflare doesn't know, sends a request on the dead socket, gets a TCP RST, returns 502. The failure happens at the TCP layer, invisible to Express/Sentry/application logging.

Fix: set `server.keepAliveTimeout = 120000` and `server.headersTimeout = 125000` (must be greater than keepAliveTimeout per Node.js requirement). See Render docs and nodejs/node#59193.

Additional hardening

  • try/catch around `renderPage()` with 500 fallback HTML page
  • 30s request timeout middleware (kills stalled SSR renders)
  • `GET /health` endpoint + `healthCheckPath` in `render.yaml`
  • Global Express error handler as safety net
  • Slow request logging (>5s) and 5xx error logging (structured JSON)
  • Process lifecycle logging (SIGTERM, uncaughtException, unhandledRejection)
  • Memory monitoring (warns at 400MB heap, checked every 60s)
  • Startup log with app name, port, and Node version

Test plan

  • `cd web && npm ci && npm run build` succeeds
  • `cd web && node server.js` starts and logs structured JSON
  • `curl localhost:3000/health` returns `{"status":"ok"}`
  • App-generated repo deploys to Render without 502s overnight

… check

Root cause of overnight 502s on Render: Node's default keepAliveTimeout
(5s) is shorter than Cloudflare's idle timeout (~100-900s). After 5s
idle, Node closes the TCP connection. Cloudflare doesn't know, sends a
request on the dead socket, gets a TCP RST, returns 502. This happens at
the TCP layer, invisible to Express/Sentry/application logging.

Fix: set keepAliveTimeout to 120s and headersTimeout to 125s per Render
docs and Node.js issue #59193.

Additional hardening (ported from step-thru #125):
- try/catch around renderPage() with 500 fallback HTML page
- 30s request timeout middleware (kills stalled SSR renders)
- GET /health endpoint + healthCheckPath in render.yaml
- Global Express error handler as safety net
- Slow request logging (>5s) and 5xx error logging (structured JSON)
- Process lifecycle logging (SIGTERM, uncaughtException, unhandledRejection)
- Memory monitoring (warns at 400MB heap, checked every 60s)
- Startup log with port and Node version
@jeanpaulsio jeanpaulsio merged commit 4054fa4 into main Apr 18, 2026
3 checks passed
@jeanpaulsio jeanpaulsio deleted the fix/ssr-keepalive-cloudflare-502 branch April 18, 2026 04:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant