Skip to content

feat(hooks): wire Claude Code PreCompact snapshot#225

Open
Gradata wants to merge 1 commit into
mainfrom
gra-1210-precompact
Open

feat(hooks): wire Claude Code PreCompact snapshot#225
Gradata wants to merge 1 commit into
mainfrom
gra-1210-precompact

Conversation

@Gradata
Copy link
Copy Markdown
Owner

@Gradata Gradata commented May 26, 2026

Implements GRA-1210 / Paperclip issue d87b77e9-d0ff-4876-a3a0-19cf7c711c73.

Changes:

  • Adds bounded Claude Code PreCompact snapshot writing to <brain>/.precompact-snapshots/<session-id>.json.
  • Wires gradata install --agent claude-code to install PreCompact alongside PreToolUse.
  • Updates Claude Code uninstall to remove both lifecycle entries.
  • Adds tests for snapshot writing, session-id sanitization, no-op behavior, module/run_hook path, and adapter install entry.

Verification:

  • pytest -q tests/test_pre_compact_hook.py tests/test_hook_adapters.py::test_claude_code_install_writes_pre_compact_entry → 5 passed
  • python3 -m gradata.hooks.pre_compact smoke wrote .precompact-snapshots/module-smoke.json

Note: isolated clean worktree from origin/main; original Paperclip workspace had unrelated dirty files, so this PR branch avoids mixing them.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Review Change Stack

📝 Walkthrough

Summary

  • New helper function: Added pre_compact_command(brain_dir: Path) -> str to _base.py to build shell commands for running the pre_compact hook with proper path quoting
  • Claude Code adapter enhanced: install() now registers hooks for both PreToolUse and PreCompact lifecycles with idempotent behavior; uninstall() removes entries from both
  • PreCompact hook reimplemented: Now writes bounded JSON snapshots to <brain>/.precompact-snapshots/<session-id>.json with deterministic session IDs, safe file naming, and atomic writes
  • Snapshot structure updated: Captures schema_version, created_at, event, trigger, cwd, transcript_path, custom_instructions, brain_dir, and relevant context with configurable size limits
  • BREAKING: pre_compact.main() return type changed from dict | None to None
  • BREAKING: HOOK_META profile changed from Profile.STANDARD to Profile.MINIMAL
  • Comprehensive test coverage: Added tests for snapshot writing, session ID sanitization, no-op behavior when brain dir is missing, and hook invocation via shared run_hook wrapper
  • Verified: All 5 tests passing; smoke run confirms snapshot file generation works as expected

Walkthrough

This PR implements a new PreCompact hook that captures bounded JSON snapshots of brain state before context compaction. It adds a command helper to the base adapter, rewrites the PreCompact hook with snapshot generation logic, integrates the hook into the Claude Code adapter for both PreToolUse and PreCompact lifecycles, and includes comprehensive testing of the snapshot generation and adapter registration.

Changes

PreCompact Hook and Adapter Integration

Layer / File(s) Summary
Hook command helper in base adapter
Gradata/src/gradata/hooks/adapters/_base.py
New pre_compact_command(brain_dir: Path) -> str constructs a quoted shell invocation that sets BRAIN_DIR and runs the pre_compact module.
PreCompact hook snapshot capture
Gradata/src/gradata/hooks/pre_compact.py
Rewrites the hook to generate bounded JSON snapshots stored in .precompact-snapshots/. Adds helpers for safe filename generation, session ID derivation, bounded file reading, and snapshot schema assembly. Updates HOOK_META profile to MINIMAL and changes main() to always return None.
Claude Code adapter hook lifecycle registration
Gradata/src/gradata/hooks/adapters/claude_code.py
Extends install() to register hooks for both PreToolUse and PreCompact lifecycles with idempotency checks. Reworks uninstall() to remove entries from both lifecycles and prune empty blocks. Adds pre_compact_command import.
Tests for adapter integration and hook behavior
Gradata/tests/test_hook_adapters.py, Gradata/tests/test_pre_compact_hook.py
Adds test for Claude Code adapter idempotency verifying PreCompact entry creation, and comprehensive hook test suite covering snapshot file generation, session ID sanitization, missing brain directory handling, and run_hook invocation.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

feature, breaking-change

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.26% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: wiring Claude Code PreCompact snapshot functionality, which aligns with the core objective of implementing GRA-1210 and adding snapshot writing to the PreCompact lifecycle.
Description check ✅ Passed The description is directly related to the changeset, clearly explaining the implementation of GRA-1210 with specific details about snapshot writing, lifecycle wiring, tests, and verification steps.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch gra-1210-precompact

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.22.0)

OpenGrep fatal error (exit code 2):
┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

�[1m Loading rules from local config...�[0m
[00.46][ERROR]: Error: exception Glob.Lexer.Syntax_error("malformed glob pattern: missing ']'")
Raised at Glob__Lexer.syntax_error in file "libs/glob/Lexer.mll", line 8, characters 2-26
Called from Glob__Lexer.__ocaml_lex_token_rec in file "libs/glob/Lexer.mll", line 29, characters 26-53
Cal


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Gradata/src/gradata/hooks/pre_compact.py`:
- Around line 45-50: The function _read_bounded currently calls
path.read_bytes()[:limit] which reads the entire file into memory then
truncates; change it to perform a truly bounded read by opening the file in
binary mode and calling file.read(limit) (e.g., with path.open("rb") as f: data
= f.read(limit)) and then decode with errors="replace"; keep the existing
is_file check and return None for non-files and return the decoded string (or
None) as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ca494a96-e5ad-456e-b72c-960c93a3f7a2

📥 Commits

Reviewing files that changed from the base of the PR and between a197bff and 9b0a11c.

📒 Files selected for processing (5)
  • Gradata/src/gradata/hooks/adapters/_base.py
  • Gradata/src/gradata/hooks/adapters/claude_code.py
  • Gradata/src/gradata/hooks/pre_compact.py
  • Gradata/tests/test_hook_adapters.py
  • Gradata/tests/test_pre_compact_hook.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: pytest macos-latest / py3.12
  • GitHub Check: pytest windows-latest / py3.11
  • GitHub Check: pytest ubuntu-latest / py3.12
  • GitHub Check: pytest macos-latest / py3.11
  • GitHub Check: pytest ubuntu-latest / py3.11
  • GitHub Check: pytest windows-latest / py3.12
  • GitHub Check: pytest (py3.11)
  • GitHub Check: pytest (py3.12)
🧰 Additional context used
📓 Path-based instructions (2)
Gradata/tests/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/tests/**/*.py: Set BRAIN_DIR environment variable via tmp_path in conftest.py for test isolation — ensure _paths.py module cache refreshes when calling Brain.init() directly inside tests
Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)

Files:

  • Gradata/tests/test_hook_adapters.py
  • Gradata/tests/test_pre_compact_hook.py
Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

  • Gradata/src/gradata/hooks/adapters/claude_code.py
  • Gradata/src/gradata/hooks/adapters/_base.py
  • Gradata/src/gradata/hooks/pre_compact.py
🔇 Additional comments (5)
Gradata/src/gradata/hooks/adapters/_base.py (1)

139-143: LGTM!

Gradata/src/gradata/hooks/pre_compact.py (1)

1-43: LGTM!

Also applies to: 55-123

Gradata/src/gradata/hooks/adapters/claude_code.py (1)

15-15: LGTM!

Also applies to: 62-97, 120-137

Gradata/tests/test_hook_adapters.py (1)

3-3: LGTM!

Also applies to: 69-89

Gradata/tests/test_pre_compact_hook.py (1)

1-73: LGTM!

Comment on lines +45 to +50
def _read_bounded(path: Path, *, limit: int = _MAX_TEXT_BYTES) -> str | None:
try:
brain_dir_str = resolve_brain_dir()
if not brain_dir_str:
if not path.is_file():
return None
brain_dir = Path(brain_dir_str)

compact_type = data.get("type", "unknown") if data else "unknown"

snapshot = {
"timestamp": datetime.now(UTC).isoformat(),
"compact_type": compact_type,
"brain_dir": str(brain_dir),
}

# Include lesson count if available
lessons_path = brain_dir / "lessons.md"
if lessons_path.is_file():
text = lessons_path.read_text(encoding="utf-8")
snapshot["lesson_count"] = len(
[
line
for line in text.splitlines()
if (stripped := line.strip()) and not stripped.startswith("#")
]
)

if hasattr(os, "getuid"):
uid = os.getuid()
else:
try:
uid = os.getlogin()
except OSError:
uid = f"pid{os.getpid()}"
user_tmp = Path(tempfile.gettempdir()) / f"gradata-{uid}"
user_tmp.mkdir(parents=True, exist_ok=True)
dir_hash = hashlib.md5(str(brain_dir).encode()).hexdigest()[:8]
snapshot_path = user_tmp / f"compact-snapshot-{dir_hash}.json"
snapshot_path.write_text(json.dumps(snapshot, indent=2), encoding="utf-8")

return {"result": "State saved before compaction"}
except Exception:
data = path.read_bytes()[:limit]
return data.decode("utf-8", errors="replace")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use truly bounded reads in _read_bounded.

Line 49 uses path.read_bytes()[:limit], which loads the full file before truncating. That defeats bounded-read behavior and can spike memory on large files.

Suggested fix
 def _read_bounded(path: Path, *, limit: int = _MAX_TEXT_BYTES) -> str | None:
     try:
         if not path.is_file():
             return None
-        data = path.read_bytes()[:limit]
+        with path.open("rb") as fh:
+            data = fh.read(limit)
         return data.decode("utf-8", errors="replace")
     except OSError:
         return None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/hooks/pre_compact.py` around lines 45 - 50, The function
_read_bounded currently calls path.read_bytes()[:limit] which reads the entire
file into memory then truncates; change it to perform a truly bounded read by
opening the file in binary mode and calling file.read(limit) (e.g., with
path.open("rb") as f: data = f.read(limit)) and then decode with
errors="replace"; keep the existing is_file check and return None for non-files
and return the decoded string (or None) as before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant