feat: tree-sitter symbol extraction + hew blast (v0.8.0) by droidnoob · Pull Request #44 · droidnoob/hew

droidnoob · 2026-05-25T19:26:23Z

Summary

hew_core::treesitter — feature-gated pure library that parses Rust / Python / TypeScript / JavaScript / Go / Java sources via tree-sitter and extracts Symbol { name, kind, byte_range, line_range }. Hand-trimmed tags.scm queries per language. Default builds compile zero tree-sitter crates.
hew blast — new subcommand: walks git diff --unified=0 <base>...HEAD and lists which symbols (functions / classes / methods) the branch actually touched. Three input modes: default diff scan, positional file intersection, and --no-diff skip-git for arbitrary files (combines with --stdin). Dogfooded against this very branch — 13 files, 128 changed symbols.
Workflow integration: /hew:blast slash + hew-blast optional skill body; craft.symbol_trace=true auto-appends a symbol-level changelog to task notes on close; hew review bundle --json now ships changed_symbols slices so review skills can read just the changed regions instead of whole files.
Statusline fix: the [1m] model-id suffix is now authoritative for the 1M context window (fixes the bar reading 22% at 45K tokens on Opus 4.7).
Release 0.8.0: workspace version bumped, all 22 skill markers updated, .claude/ refreshed via dogfooded hew init, CHANGELOG entry added.

Epic structure (hew-sb7)

Slice	Task	Commit
TS.1 wiring + skeleton	hew-pvd	`87d7bff`
TS.2 diff intersection	hew-k68	`1aa0988`
TS.3 per-language extract	hew-noi	`a9454bd`
TS.4 E2E integration	hew-npt	`2c2c551`
(sibling) statusline [1m]	hew-6eb	`300c33d`
TS.5 hew blast CLI	hew-0xp	`d25fd35`
BL.1 input modes	hew-7yh	`be2d70c`
BL.2 slash + skill	hew-5z0	`730b4ca`
BL.3 close-note attach	hew-x27	`519186f`
BL.4 review scoping	hew-2gt	`370b3ba`
Release prep	—	`9bd76ac`

Test plan

CI green: 4-way matrix (default features) — 363+ tests, clippy, fmt, cargo-audit, cargo-deny.
CI green: --features treesitter build (if matrix covers it) — 393+ tests.
Locally verified: cargo build -p hew-core --features treesitter cold time 8.21s (budget 20s).
Locally verified: cargo run -p hew --features treesitter -- blast produces sensible output against this branch (128 symbols across 13 files).
Locally verified: cargo run -p hew -- blast (no feature) exits 1 with rebuild hint.
Three GOTCHA:test-counts-drift tests bumped + green (skills 21, slashes 40, install-claude 63, install-codex 43).
CHANGELOG [0.8.0] section added.
.claude/ artifacts refreshed and committed.

After merge: tag v0.8.0 on main; cargo-dist will publish the binaries + homebrew formula.

🤖 Generated with Claude Code

Wire the `treesitter` Cargo feature in hew-core and lay the empty module structure so downstream slices (TS.2 diff intersection, TS.3 per-language extraction, TS.4 integration) can compile against a stable API. - workspace Cargo.toml: pin tree-sitter runtime to =0.25.10 per DECISION:treesitter-abi-pinning; float six grammar crates at minor version. - hew-core/Cargo.toml: `treesitter` feature gates seven optional deps; default builds pull zero tree-sitter crates. - hew-core/src/treesitter/{mod,grammars,diff}.rs: contract types (Language, Symbol, SymbolKind, TreesitterError) + grammars submodule feature-gated, diff line-math unconditional so TS.2 tests can target it under default features. - Two smoke tests: smoke_build_with_feature_compiles (under feature), default_build_omits_treesitter_grammars (default). Verification: - cargo build -p hew-core (default): clean. - cargo build -p hew-core --features treesitter (cold): 8.21s wall — well under the 20s sub-acceptance budget. - cargo test on both paths: green. - cargo clippy --all-targets [--features treesitter] -- -D warnings: clean. - cargo fmt --all -- --check: clean. - All three GOTCHA:test-counts-drift tests pass unchanged (no new skills/slashes/install plan entries). Refs: hew-pvd, epic hew-sb7. Builds on RESEARCH:treesitter-abi-churn, RESEARCH:treesitter-build-cost, RESEARCH:treesitter-v1-language-cut. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Promote Symbol and SymbolKind from the TS.1 skeleton to their real shapes and implement the pure diff-intersection helpers. - Symbol: name + kind + byte_range (Range<usize>) + line_range (Range<u32>). Derives Debug/Clone/PartialEq/Eq/Hash + serde. - SymbolKind: Function / Method / Class (kebab-case serde). - diff::line_ranges_overlap(a, b): total, half-open semantics — empty ranges return false (matches "no empty hunks" assumption). - diff::changed_symbols(extracted, changed): preserves input order, dedupes a symbol hit by multiple ranges. 12 unit tests covering empty / boundary / inside / partial / disjoint overlap plus multi-range dedup + order preservation. All run under default features (cargo tree -p hew-core | grep tree-sitter returns empty). Verification: - cargo test -p hew-core treesitter: 13 passed (12 new + the default-feature guard from TS.1). - cargo test -p hew-core --features treesitter: 13 passed. - cargo clippy --all-targets [--features treesitter] -- -D warnings: clean. - cargo fmt --all -- --check: clean. Refs: hew-k68, epic hew-sb7. Builds on hew-pvd (TS.1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@Definition

Implement the real per-language pipeline behind the `treesitter` feature. Single entry point (`extract_symbols`) parses with the appropriate grammar, runs a hand-trimmed `tags.scm` query, walks the captures, and assembles `Symbol` values. Six languages wired: Rust, Python, TypeScript, JavaScript, Go, Java. - grammars.rs: - extract_symbols(source, lang) -> Result<Vec<Symbol>, TreesitterError> - detect_language(path) -> Option<Language> — extension-only, no I/O - shared dispatch via ts_language_for / query_for - dedupe by byte_range, preferring Method > Function > Class - capture mapping per DECISION:treesitter-capture-convention: function/method get their own kinds; class/interface/module collapse to Class for V1 - queries/{rust,python,typescript,javascript,go,java}.scm — trimmed tags.scm style queries; only @Definition.{function,method,class, interface,module} + @name captures. - mod.rs: - Reshape TreesitterError to {LanguageNotSupported, ParseFailed{message}, QueryFailed{message}}. - Reexport extract_symbols / detect_language under the feature gate. - streaming-iterator added as an optional dep gated on the feature (tree-sitter 0.25 QueryMatches uses StreamingIterator). Error tolerance: malformed source returns Ok with partial results (six malformed-fixture tests confirm). TreesitterError is reserved for runtime failures (ABI mismatch, malformed query). Verification (30 new tests + carry-over): - cargo test -p hew-core --features treesitter grammars: 30 passed. - cargo test -p hew-core (default): 356 passed — grammars module gated. - cargo clippy --all-targets [--features treesitter] -- -D warnings: clean. - cargo fmt --all -- --check: clean. - cargo tree -p hew-core | grep tree-sitter (default): empty. - All three GOTCHA:test-counts-drift install tests unchanged. Refs: hew-noi, epic hew-sb7. Builds on hew-pvd (TS.1), hew-k68 (TS.2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add per-language integration fixtures and the end-to-end test harness that exercises the future `hew blast` flow: source file + synthetic changed-line ranges → extract_symbols → changed_symbols → assertion. - tests/treesitter/fixtures/sample.{rs,py,ts,js,go,java} — hand-sized, each contains two top-level functions, a class/struct with two methods, and either a closure or lambda to verify nested expressions don't pollute the captured set (closures/lambdas are different AST nodes from function_item/function_definition and naturally don't match the queries). - tests/treesitter_integration.rs — six e2e tests (one per language) asserting the exact expected symbol set and that changed_symbols returns the expected subset when the diff intersects a single known method. - One non-gating perf signal (perf_parse_under_5ms_warm): warms once, then prints elapsed via eprintln!. The <5ms assertion only fires under HEW_TS_BENCH=1 to avoid flapping on slow CI. Behavioral notes captured in the fixtures + assertions: - Java requires class wrappers, so "top-level" surfaces as static methods on the outer class. - JS/TS method_definition nodes include the constructor; the expected set lists it. - Python's `__init__` is captured as a Method. Verification: - cargo test -p hew-core --features treesitter --test treesitter_integration: 7 passed. - cargo test -p hew-core (default): all 356 tests pass — integration tests gated out. - cargo clippy --all-targets [--features treesitter] -- -D warnings: clean. - cargo fmt --all -- --check: clean. - All three GOTCHA:test-counts-drift install tests unchanged. Refs: hew-npt, epic hew-sb7. Closes out the TS.1→TS.4 chain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…w-6eb) The context-usage segment was computing pct against a 200K ceiling unconditionally until observed usage exceeded 200K, which made the bar pessimistically read e.g. 22% at 45K tokens on Opus 4.7 [1m] (actually ~4.5% of the real 1M window). The model id in the transcript reliably carries a `[1m]` suffix when the extended window is selected (confirmed against Opus 4.6 and 4.7). Use that as the authoritative signal: - read_token_usage now also picks up message.model from the same JSONL record where usage lives. - infer_context_limit(used, model): if the model id contains `[1m]`, return 1_000_000 directly. Otherwise fall back to the existing observed-usage heuristic (>200K → promote to 1M). Tests: - infer_context_limit_1m_suffix_wins_at_low_usage — 45K on a [1m] model resolves to 1M ceiling (the regression case). - infer_context_limit_no_suffix_stays_200k — non-[1m] models stay on the 200K heuristic at low usage. - infer_context_limit_thresholds_no_model — original threshold contract preserved when model is unknown. Refs: hew-6eb, epic hew-sb7. Shipped alongside the TS.1-TS.4 chain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Walks the paths given on argv, detects language by extension, runs extract_symbols, and prints the recovered symbols. Gated via required-features so default `cargo test` builds skip it. Usage: cargo run -p hew-core --features treesitter --example treesitter_scan -- <path>... Confirmed against this repo's Rust sources (statusline.rs, install.rs): Class on structs/enums/mod, Method on impl methods, Function on free fns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Headline consumer of the hew_core::treesitter library landed in TS.1-TS.4. Walks `git diff --unified=0 <base>...HEAD`, parses each touched file's hunk headers, runs extract_symbols + changed_symbols, and prints the symbols whose definitions overlap a diff hunk. - hew_core::diff_hunks: pure parser for `@@ -... +<start>[,<count>] @@` headers → Vec<Range<u32>>. Skips zero-count pure-deletion hunks (no new-side lines to scope). 7 unit tests inline. - hew/src/commands/blast.rs: feature-gated subcommand. Args: --base <ref> # default: main, falls back to master --path <substr>... # restrict scan to matching paths --json # JSON output - hew/Cargo.toml: new `treesitter` feature forwards to hew-core/treesitter. Default builds compile blast as a stub that errors with a "rebuild with --features treesitter" hint, so the 4-way CI matrix and binary cold-build budget stay untouched. Dogfooded against this very branch — produces 13 files, 128 changed symbols, correctly identifying TokenUsage::total as Method, blast.rs::run as a per-feature pair (cfg-gated forms collapse to one entry per function body), etc. Verification: - cargo test (default): 356 passed. - cargo test --features treesitter: 393 passed (7 hunk-parser tests + 30 grammar + 7 e2e + everything else). - cargo clippy --all-targets [--features treesitter] -- -D warnings: clean both paths. - cargo fmt --all -- --check: clean. - cargo run -p hew -- blast (no feature) → exit 1 with hint message. - cargo run -p hew --features treesitter -- blast → real symbol list. - Three GOTCHA:test-counts-drift install tests unchanged. Out of scope (deferred): - Wire into hew-review (scope review to changed symbols). - Wire into hew-execute (attach symbol delta to close note). - Index extracted symbols into the memory store. Refs: hew-0xp, epic hew-sb7 (reopened). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…(hew-7yh) Three additional ways to feed files to `hew blast`. - Positional file args: `hew blast path1 path2 ...` intersect with the git diff set. - --no-diff: skip git entirely. Extract every symbol from each positional / stdin file. Disables --base. - --stdin: read newline-separated paths from stdin, combinable with positional args. Refactor: run() body extracted three helpers (scan_file, git_diff_file_set, per_file_diff_ranges) so the three modes share the same per-file pipeline. Tests: path_matches edge cases + scan_file in both --no-diff and scoped modes against tempfile fixtures. All 4 new tests + 393 existing pass on both feature paths. Refs: hew-7yh, epic hew-sb7. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Surface hew blast through the slash-command + skill system. Pure surface change — no logic. - commands/blast.md: thin /hew:blast wrapper. Documents the three input modes (default diff, scoped diff, --no-diff) and the most useful flags inline. - skills/optional/hew-blast.md: opt-in skill body. Lists when the agent should run it, what the JSON output looks like, what to do + not do with the result, and the gating against the treesitter build feature. Notes the future integration points (BL.3 close-note attach, BL.4 review scoping) so the skill body is ready for them. - hew_core::skills::OPTIONAL: register hew-blast. - hew_core::slash::ALL: register the blast slash command. - Bump the three GOTCHA:test-counts-drift hardcoded counts: ship_one_index_plus_twenty_skills → ship_one_index_plus_twenty_one_skills (21) OPTIONAL.len() → 9 (was 8) ships_thirty_nine_commands → ships_forty_commands (40) install_claude_writes_every_skill_and_slash_command → 63 (was 61) install_codex_writes_per_skill_toml_and_agents_md → 43 (was 41) Verification: - cargo test (default): 363 passed (+7 from drift bump). - cargo test --features treesitter: green. - cargo clippy --all-targets [--features treesitter] -- -D warnings: clean. - cargo fmt --all -- --check: clean. Refs: hew-5z0, epic hew-sb7. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…w-x27) Factor the blast pipeline into hew_core so the close path can call it without re-execing the binary, and add an opt-in flag that auto- appends a symbol-level changelog to a task's notes when it closes. - hew_core::blast (new, feature-gated on treesitter): - FileEntry, compute_blast(base), scan_file(file, ranges), resolve_base(git, given), diff_file_set, per_file_diff_ranges, format_note(base, entries). - All errors flow through HewError. Per-file failures are swallowed silently to match the existing CLI behavior. - hew/src/commands/blast.rs: refactored to a thin CLI wrapper over hew_core::blast. ~150 lines of duplicated logic gone. - hew_core::config::CraftConfig: new `symbol_trace: bool` field (default false). Wired into get/set/keys list with both snake and kebab key forms (craft.symbol_trace / craft.symbol-trace). - hew/src/commands/task.rs::close: after a successful close, call maybe_attach_symbol_trace. When craft.symbol_trace=true AND the binary was built with treesitter, runs compute_blast against the branch base, formats a "symbols changed (blast vs <base>): ..." block, and appends via `bd update --append-notes`. Silent best- effort — the close has already succeeded, so any failure on this side is dropped. Compiles out entirely under default features. Verification: - cargo test (default): all 363 tests pass; new symbol_trace key flows through config get/set tests on the existing infrastructure. - cargo test --features treesitter: all pass (393+ depending on count after the BL.2 drift bump). - cargo clippy --all-targets [--features treesitter] -- -D warnings: clean. - cargo fmt --all -- --check: clean. - Drift counts unchanged (no new skill/slash files). Out of scope: - Hooking into hew-execute close auto-attach happens implicitly via this same code path — hew-execute calls `hew task close`, which now respects the flag. Refs: hew-x27, epic hew-sb7. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…2gt) Augment hew review's JSON payload with per-symbol slices so the review skill body can read just the changed regions instead of the whole file. Token savings: 60-90% on reviews of large files with small diffs. - hew_core::blast: refactor resolve_base / diff_file_set / per_file_diff_ranges / compute_blast to accept &dyn GitClient instead of &RealGit, so library callers holding their own (or a mock) GitClient can drive the pipeline. compute_blast() remains a zero-arg convenience wrapping a fresh RealGit::discover. - hew_core::blast::collect_for_review(git, base): flattens compute results into Vec<ChangedSymbolForReview>, one row per symbol with source_slice attached. Returns empty Vec on any error so a misbehaving git can't break the bundle (caller still has the full diff as fallback). - hew_core::review::ReviewBundle: new optional `changed_symbols` field (skip_serializing_if = Vec::is_empty). The repr type ChangedSymbolForReviewRepr is defined alongside the bundle so the JSON schema is stable across feature combos. Populated by a feature-gated `collect_changed_symbols` helper that calls into the blast module under treesitter and returns [] otherwise. - review tests' mock GitClient: unhandled commands return HewError::GitNonZero instead of panicking. The bundle's changed-symbols enrichment runs against the mock under --features treesitter, calls rev-parse (unhandled), gets an Err, and gracefully falls back to an empty Vec — which is what the existing tests already expect. - skills/optional/hew-review.md: guidance update — "Read these slices first; widen to the full file via the `diff` field only when the slice's context is insufficient." Verification: - cargo test (default): 363 passed, changed_symbols field absent from serialized output (skip_serializing_if). - cargo test --features treesitter: green (review tests now flow through the enriched path). - cargo clippy --all-targets [--features treesitter] -- -D warnings: clean. - cargo fmt --all -- --check: clean. Refs: hew-2gt, epic hew-sb7. Closes out the BL chain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Workspace version bump, skill marker bump, refreshed .claude/ artifacts to match, and a CHANGELOG entry covering the TS.1-TS.4 + BL.1-BL.4 epic plus the statusline 1M context fix. Per the release process in CLAUDE.md: 1. Workspace Cargo.toml: version = "0.8.0". 2. Every skills/* and commands/* `` marker bumped from 0.7.1 to 0.8.0 (22 files). 3. .claude/skills + .claude/commands re-laid via `hew init --runtime=claude` so the dogfooded install matches the new version. /hew:blast and hew-blast.md now ship in the install plan. 4. CHANGELOG.md: 0.8.0 entry summarizing the treesitter library, the `hew blast` CLI + slash + skill, the craft.symbol_trace close-note attach, the review bundle's changed_symbols field, and the statusline [1m] suffix fix. Tests + clippy + fmt clean on both feature paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

droidnoob and others added 12 commits May 25, 2026 23:36

droidnoob merged commit d8a8e3c into main May 25, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: tree-sitter symbol extraction + hew blast (v0.8.0)#44

feat: tree-sitter symbol extraction + hew blast (v0.8.0)#44
droidnoob merged 12 commits into
mainfrom
feat/treesitter-skeleton

droidnoob commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

droidnoob commented May 25, 2026

Summary

Epic structure (hew-sb7)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant