Version
paperclipai@2026.416.0 with the default embedded-postgres database mode.
Severity
High — the main dashboard becomes unreachable whenever a single corrupt row exists. The UI shows no data and errors out entirely.
Repro
- Start a fresh Paperclip instance with the default embedded Postgres.
- Emit a heartbeat whose
result string is longer than 500 bytes and whose character at byte position 498–500 is a 3-byte UTF-8 sequence (e.g. an ellipsis …, em-dash —, or smart quote). Natural text emitted by claude_local adapters produces this regularly.
GET /api/companies/{id}/heartbeat-runs?limit=200 → 500. The server log shows invalid byte sequence for encoding "UTF8": 0x<xx> coming from postgres.
Root cause
Two interacting facts:
-
The embedded cluster is SQL_ASCII. The embedded-postgres init doesn't force --encoding=UTF8, so server_encoding is SQL_ASCII. In that mode PG does not validate UTF-8 on write or on extraction via jsonb ->> text, and left(text, N) cuts at byte N rather than char N.
-
The query slices with left(..., 500) in @paperclipai/server services/heartbeat.js (around lines 263-314 of the built dist/services/heartbeat.js):
```js
left(result_json ->> 'summary', HEARTBEAT_RUN_RESULT_SUMMARY_MAX_CHARS)
left(result_json ->> 'result', HEARTBEAT_RUN_RESULT_SUMMARY_MAX_CHARS)
```
HEARTBEAT_RUN_RESULT_SUMMARY_MAX_CHARS = 500. When the slice lands mid-sequence (e.g. E2 80 of a 3-byte E2 80 A6 ellipsis) the returned bytes are not valid UTF-8. postgres.js's row decoder then throws and the whole response 500s.
Fix options (any one is sufficient)
- Slice by characters, not bytes. Replace
left(text, 500) with substring(text from 1 for 500) and rely on jsonb ->> text already decoding into the session client encoding. Requires client_encoding = UTF8 in the session, which is already the default for postgres.js.
- Truncate in JS after fetch. Drop the
left(...) wrapper, select the full text, slice in JS using code-point-aware logic.
- Initialize the embedded cluster with UTF-8. Pass
--encoding=UTF8 --locale=C (or equivalent) when running initdb for a new cluster. Cleanest long-term fix, but requires a dump/restore for existing installs.
Recommended: (1) — smallest diff, no cluster migration, defense-in-depth regardless of encoding.
Local mitigation (for anyone hitting this)
A launchd-scheduled cleanup script that scans heartbeat_runs, identifies rows whose first 500 bytes of result are invalid UTF-8, and rewrites them to a clean ≤500-byte truncation ending with a valid ellipsis. Safe to run repeatedly.
Related
Part of a group of 3 issues from the same diagnosis session:
- Bug 2:
GET /issues/:id/comments?after=... 500 (Date passed as postgres.js bind param).
- Bug 3:
PATCH /agents/me always 404.
Version
paperclipai@2026.416.0with the defaultembedded-postgresdatabase mode.Severity
High — the main dashboard becomes unreachable whenever a single corrupt row exists. The UI shows no data and errors out entirely.
Repro
resultstring is longer than 500 bytes and whose character at byte position 498–500 is a 3-byte UTF-8 sequence (e.g. an ellipsis…, em-dash—, or smart quote). Natural text emitted byclaude_localadapters produces this regularly.GET /api/companies/{id}/heartbeat-runs?limit=200→ 500. The server log showsinvalid byte sequence for encoding "UTF8": 0x<xx>coming from postgres.Root cause
Two interacting facts:
The embedded cluster is
SQL_ASCII. The embedded-postgres init doesn't force--encoding=UTF8, soserver_encodingisSQL_ASCII. In that mode PG does not validate UTF-8 on write or on extraction viajsonb ->> text, andleft(text, N)cuts at byteNrather than charN.The query slices with
left(..., 500)in@paperclipai/serverservices/heartbeat.js(around lines 263-314 of the builtdist/services/heartbeat.js):```js
left(result_json ->> 'summary', HEARTBEAT_RUN_RESULT_SUMMARY_MAX_CHARS)
left(result_json ->> 'result', HEARTBEAT_RUN_RESULT_SUMMARY_MAX_CHARS)
```
HEARTBEAT_RUN_RESULT_SUMMARY_MAX_CHARS = 500. When the slice lands mid-sequence (e.g.E2 80of a 3-byteE2 80 A6ellipsis) the returned bytes are not valid UTF-8. postgres.js's row decoder then throws and the whole response 500s.Fix options (any one is sufficient)
left(text, 500)withsubstring(text from 1 for 500)and rely onjsonb ->> textalready decoding into the session client encoding. Requiresclient_encoding = UTF8in the session, which is already the default for postgres.js.left(...)wrapper, select the full text, slice in JS using code-point-aware logic.--encoding=UTF8 --locale=C(or equivalent) when runninginitdbfor a new cluster. Cleanest long-term fix, but requires a dump/restore for existing installs.Recommended: (1) — smallest diff, no cluster migration, defense-in-depth regardless of encoding.
Local mitigation (for anyone hitting this)
A launchd-scheduled cleanup script that scans
heartbeat_runs, identifies rows whose first 500 bytes ofresultare invalid UTF-8, and rewrites them to a clean ≤500-byte truncation ending with a valid ellipsis. Safe to run repeatedly.Related
Part of a group of 3 issues from the same diagnosis session:
GET /issues/:id/comments?after=...500 (Date passed as postgres.js bind param).PATCH /agents/mealways 404.