heartbeat-runs endpoint 500s when result_json->>'result' has multibyte UTF-8 at byte-500 boundary (SQL_ASCII cluster + left(text, 500))

### Version
`paperclipai@2026.416.0` with the default `embedded-postgres` database mode.

### Severity
High — the main dashboard becomes unreachable whenever a single corrupt row exists. The UI shows no data and errors out entirely.

### Repro
1. Start a fresh Paperclip instance with the default embedded Postgres.
2. Emit a heartbeat whose `result` string is longer than 500 bytes and whose character at byte position 498–500 is a 3-byte UTF-8 sequence (e.g. an ellipsis `…`, em-dash `—`, or smart quote). Natural text emitted by `claude_local` adapters produces this regularly.
3. `GET /api/companies/{id}/heartbeat-runs?limit=200` → 500. The server log shows `invalid byte sequence for encoding "UTF8": 0x<xx>` coming from postgres.

### Root cause
Two interacting facts:

- **The embedded cluster is `SQL_ASCII`.** The embedded-postgres init doesn't force `--encoding=UTF8`, so `server_encoding` is `SQL_ASCII`. In that mode PG does not validate UTF-8 on write or on extraction via `jsonb ->> text`, and `left(text, N)` cuts at byte `N` rather than char `N`.
- **The query slices with `left(..., 500)`** in `@paperclipai/server` `services/heartbeat.js` (around lines 263-314 of the built `dist/services/heartbeat.js`):

  \`\`\`js
  left(result_json ->> 'summary', HEARTBEAT_RUN_RESULT_SUMMARY_MAX_CHARS)
  left(result_json ->> 'result',  HEARTBEAT_RUN_RESULT_SUMMARY_MAX_CHARS)
  \`\`\`

  `HEARTBEAT_RUN_RESULT_SUMMARY_MAX_CHARS = 500`. When the slice lands mid-sequence (e.g. `E2 80` of a 3-byte `E2 80 A6` ellipsis) the returned bytes are not valid UTF-8. postgres.js's row decoder then throws and the whole response 500s.

### Fix options (any one is sufficient)
1. **Slice by characters, not bytes.** Replace `left(text, 500)` with `substring(text from 1 for 500)` and rely on `jsonb ->> text` already decoding into the session client encoding. Requires `client_encoding = UTF8` in the session, which is already the default for postgres.js.
2. **Truncate in JS after fetch.** Drop the `left(...)` wrapper, select the full text, slice in JS using code-point-aware logic.
3. **Initialize the embedded cluster with UTF-8.** Pass `--encoding=UTF8 --locale=C` (or equivalent) when running `initdb` for a new cluster. Cleanest long-term fix, but requires a dump/restore for existing installs.

Recommended: (1) — smallest diff, no cluster migration, defense-in-depth regardless of encoding.

### Local mitigation (for anyone hitting this)
A launchd-scheduled cleanup script that scans `heartbeat_runs`, identifies rows whose first 500 bytes of `result` are invalid UTF-8, and rewrites them to a clean ≤500-byte truncation ending with a valid ellipsis. Safe to run repeatedly.

### Related
Part of a group of 3 issues from the same diagnosis session:
- Bug 2: `GET /issues/:id/comments?after=...` 500 (Date passed as postgres.js bind param).
- Bug 3: `PATCH /agents/me` always 404.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

heartbeat-runs endpoint 500s when result_json->>'result' has multibyte UTF-8 at byte-500 boundary (SQL_ASCII cluster + left(text, 500)) #3829

Version

Severity

Repro

Root cause

Fix options (any one is sufficient)

Local mitigation (for anyone hitting this)

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

heartbeat-runs endpoint 500s when result_json->>'result' has multibyte UTF-8 at byte-500 boundary (SQL_ASCII cluster + left(text, 500)) #3829

Description

Version

Severity

Repro

Root cause

Fix options (any one is sufficient)

Local mitigation (for anyone hitting this)

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions