Skip to content

heartbeat-runs endpoint 500s when result_json->>'result' has multibyte UTF-8 at byte-500 boundary (SQL_ASCII cluster + left(text, 500)) #3829

@bscottdavis

Description

@bscottdavis

Version

paperclipai@2026.416.0 with the default embedded-postgres database mode.

Severity

High — the main dashboard becomes unreachable whenever a single corrupt row exists. The UI shows no data and errors out entirely.

Repro

  1. Start a fresh Paperclip instance with the default embedded Postgres.
  2. Emit a heartbeat whose result string is longer than 500 bytes and whose character at byte position 498–500 is a 3-byte UTF-8 sequence (e.g. an ellipsis , em-dash , or smart quote). Natural text emitted by claude_local adapters produces this regularly.
  3. GET /api/companies/{id}/heartbeat-runs?limit=200 → 500. The server log shows invalid byte sequence for encoding "UTF8": 0x<xx> coming from postgres.

Root cause

Two interacting facts:

  • The embedded cluster is SQL_ASCII. The embedded-postgres init doesn't force --encoding=UTF8, so server_encoding is SQL_ASCII. In that mode PG does not validate UTF-8 on write or on extraction via jsonb ->> text, and left(text, N) cuts at byte N rather than char N.

  • The query slices with left(..., 500) in @paperclipai/server services/heartbeat.js (around lines 263-314 of the built dist/services/heartbeat.js):

    ```js
    left(result_json ->> 'summary', HEARTBEAT_RUN_RESULT_SUMMARY_MAX_CHARS)
    left(result_json ->> 'result', HEARTBEAT_RUN_RESULT_SUMMARY_MAX_CHARS)
    ```

    HEARTBEAT_RUN_RESULT_SUMMARY_MAX_CHARS = 500. When the slice lands mid-sequence (e.g. E2 80 of a 3-byte E2 80 A6 ellipsis) the returned bytes are not valid UTF-8. postgres.js's row decoder then throws and the whole response 500s.

Fix options (any one is sufficient)

  1. Slice by characters, not bytes. Replace left(text, 500) with substring(text from 1 for 500) and rely on jsonb ->> text already decoding into the session client encoding. Requires client_encoding = UTF8 in the session, which is already the default for postgres.js.
  2. Truncate in JS after fetch. Drop the left(...) wrapper, select the full text, slice in JS using code-point-aware logic.
  3. Initialize the embedded cluster with UTF-8. Pass --encoding=UTF8 --locale=C (or equivalent) when running initdb for a new cluster. Cleanest long-term fix, but requires a dump/restore for existing installs.

Recommended: (1) — smallest diff, no cluster migration, defense-in-depth regardless of encoding.

Local mitigation (for anyone hitting this)

A launchd-scheduled cleanup script that scans heartbeat_runs, identifies rows whose first 500 bytes of result are invalid UTF-8, and rewrites them to a clean ≤500-byte truncation ending with a valid ellipsis. Safe to run repeatedly.

Related

Part of a group of 3 issues from the same diagnosis session:

  • Bug 2: GET /issues/:id/comments?after=... 500 (Date passed as postgres.js bind param).
  • Bug 3: PATCH /agents/me always 404.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions