Skip to content

Wrap-up flow leakage: Chinese verb form '存檔完成' bypasses PROCESS_REPORT_RES anchor #27

@tznthou

Description

@tznthou

Context

After v0.3.0 P1 ship (PR #26 / closes #21) split persistence into trust-graded session_journal (low-trust) → manual ccmem promotememories (high-trust), the KNOWLEDGE_THRESHOLD persistence gate was removed. Score 0/1 entries now reach session_journal as designed. To compensate, outcome-scorer.ts keeps a hard-floor isProcessReport() filter (defense in depth) that rejects session wrap-up process reports before they enter the journal.

Finding

24h after v0.3.0 daemon upgrade (5/6 evening → 5/7 09:18), 11 entries reached session_journal. 3 of them (id=2, id=6, id=8 — 27% of intake) are session wrap-up process reports that bypassed isProcessReport().

Root cause

src/core/outcome-scorer.ts:28-30:

const PROCESS_REPORT_RES: ReadonlyArray<RegExp> = [
  /^[#\s💾🟢_*-]*\/?save[\s-]?t(?:\s*[:]\s*|\s+)(?:|||done|finished|completed?)/iu,
]

Anchor literally requires the slash-command literal at start. But the wrap-up flow's assistant-side output in CJK opens with the Chinese verb form 存檔完成 instead of the slash form the anchor expects. The regex misses it entirely.

Evidence — 3 corroborating reports (24h)

journal id created_at content opening score reasons
2 2026-05-06 07:20:38 存檔完成 — 雙寫同步、diff 為空。 0 []
6 2026-05-07 04:34:27 存檔完成,可以安全 \/compact` 或結束 session。` 0 []
8 2026-05-07 04:48:29 存檔完成。\n\n## 更新摘要\n\n| 檔案 | 動作 | 0 []

All 3 share the pattern: short verb-form opener (存檔完成) + optional dash/comma clause + table or bullet list. score=0 (no signal category hit), so they only made it through because the persistence gate is gone — which is the intended trust-grade design, but isProcessReport() was supposed to short-circuit them.

Manual cleanup tested: ccmem reject 2 6 8 → all marked rejected, expires 2026-05-14 (sweep clears).

Why this matters

  • 27% noise-rate in journal directly hurts journalPendingCount signal — 5/20 14d observation gate (COUNT(*) >= 50) loses signal-to-noise resolution.
  • Distinct from Outcome scorer: CJK implementation-summary pattern coverage gap #25 (CJK implementation-summary scorer pattern coverage) — that issue is about real outcomes scoring sub-threshold. This is about process meta bypassing the hard-floor filter.
  • Distinct from the original 5/6 morning v0.2.7 fix (which added PROCESS_REPORT_RES to anchor the slash-command literal) — fix targeted assistant text starting with the slash form; missed the Chinese verb form.

Proposed fix

Extend PROCESS_REPORT_RES with verb-form anchors, narrow enough to avoid false-positive on real text containing 存檔完成 mid-sentence:

const PROCESS_REPORT_RES: ReadonlyArray<RegExp> = [
  /^[#\s💾🟢_*-]*\/?save[\s-]?t(?:\s*[:]\s*|\s+)(?:|||done|finished|completed?)/iu,
  /^[#\s💾🟢_*-]*(?:|)\s*[,.:!]/u,
]

Risks:

  • False-positive on legit 存檔完成。然後我修了 X bug opener — accept since real impl-facts content rarely opens with this phrase
  • Pattern over-broadening — keep narrow with required punctuation/em-dash after 存檔完成

Acceptance criteria

  • Add regex + 3 ground-truth should_reject cases (id=2/6/8 content) to tests/outcome-scorer.test.ts
  • Add 1 false-positive guard test: long-form text containing 存檔完成 mid-sentence should NOT match
  • Verify journalPendingCount post-fix on next 24h dogfood run shows wrap-up verb-form pattern at 0
  • No regression on existing slash-command literal anchor

Related

3+ corroborating reports converging on 存檔完成 opener already met within 24h — fix can proceed without waiting for additional samples.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions