feat: tree-sitter symbol extraction + hew blast (v0.8.0)#44
Merged
Conversation
Wire the `treesitter` Cargo feature in hew-core and lay the empty module
structure so downstream slices (TS.2 diff intersection, TS.3 per-language
extraction, TS.4 integration) can compile against a stable API.
- workspace Cargo.toml: pin tree-sitter runtime to =0.25.10 per
DECISION:treesitter-abi-pinning; float six grammar crates at minor
version.
- hew-core/Cargo.toml: `treesitter` feature gates seven optional deps;
default builds pull zero tree-sitter crates.
- hew-core/src/treesitter/{mod,grammars,diff}.rs: contract types
(Language, Symbol, SymbolKind, TreesitterError) + grammars submodule
feature-gated, diff line-math unconditional so TS.2 tests can target
it under default features.
- Two smoke tests: smoke_build_with_feature_compiles (under feature),
default_build_omits_treesitter_grammars (default).
Verification:
- cargo build -p hew-core (default): clean.
- cargo build -p hew-core --features treesitter (cold): 8.21s wall —
well under the 20s sub-acceptance budget.
- cargo test on both paths: green.
- cargo clippy --all-targets [--features treesitter] -- -D warnings: clean.
- cargo fmt --all -- --check: clean.
- All three GOTCHA:test-counts-drift tests pass unchanged (no new
skills/slashes/install plan entries).
Refs: hew-pvd, epic hew-sb7. Builds on RESEARCH:treesitter-abi-churn,
RESEARCH:treesitter-build-cost, RESEARCH:treesitter-v1-language-cut.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Promote Symbol and SymbolKind from the TS.1 skeleton to their real shapes and implement the pure diff-intersection helpers. - Symbol: name + kind + byte_range (Range<usize>) + line_range (Range<u32>). Derives Debug/Clone/PartialEq/Eq/Hash + serde. - SymbolKind: Function / Method / Class (kebab-case serde). - diff::line_ranges_overlap(a, b): total, half-open semantics — empty ranges return false (matches "no empty hunks" assumption). - diff::changed_symbols(extracted, changed): preserves input order, dedupes a symbol hit by multiple ranges. 12 unit tests covering empty / boundary / inside / partial / disjoint overlap plus multi-range dedup + order preservation. All run under default features (cargo tree -p hew-core | grep tree-sitter returns empty). Verification: - cargo test -p hew-core treesitter: 13 passed (12 new + the default-feature guard from TS.1). - cargo test -p hew-core --features treesitter: 13 passed. - cargo clippy --all-targets [--features treesitter] -- -D warnings: clean. - cargo fmt --all -- --check: clean. Refs: hew-k68, epic hew-sb7. Builds on hew-pvd (TS.1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implement the real per-language pipeline behind the `treesitter` feature.
Single entry point (`extract_symbols`) parses with the appropriate
grammar, runs a hand-trimmed `tags.scm` query, walks the captures, and
assembles `Symbol` values. Six languages wired: Rust, Python, TypeScript,
JavaScript, Go, Java.
- grammars.rs:
- extract_symbols(source, lang) -> Result<Vec<Symbol>, TreesitterError>
- detect_language(path) -> Option<Language> — extension-only, no I/O
- shared dispatch via ts_language_for / query_for
- dedupe by byte_range, preferring Method > Function > Class
- capture mapping per DECISION:treesitter-capture-convention:
function/method get their own kinds; class/interface/module collapse
to Class for V1
- queries/{rust,python,typescript,javascript,go,java}.scm — trimmed
tags.scm style queries; only @Definition.{function,method,class,
interface,module} + @name captures.
- mod.rs:
- Reshape TreesitterError to {LanguageNotSupported, ParseFailed{message},
QueryFailed{message}}.
- Reexport extract_symbols / detect_language under the feature gate.
- streaming-iterator added as an optional dep gated on the feature
(tree-sitter 0.25 QueryMatches uses StreamingIterator).
Error tolerance: malformed source returns Ok with partial results (six
malformed-fixture tests confirm). TreesitterError is reserved for
runtime failures (ABI mismatch, malformed query).
Verification (30 new tests + carry-over):
- cargo test -p hew-core --features treesitter grammars: 30 passed.
- cargo test -p hew-core (default): 356 passed — grammars module gated.
- cargo clippy --all-targets [--features treesitter] -- -D warnings: clean.
- cargo fmt --all -- --check: clean.
- cargo tree -p hew-core | grep tree-sitter (default): empty.
- All three GOTCHA:test-counts-drift install tests unchanged.
Refs: hew-noi, epic hew-sb7. Builds on hew-pvd (TS.1), hew-k68 (TS.2).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add per-language integration fixtures and the end-to-end test harness
that exercises the future `hew blast` flow: source file + synthetic
changed-line ranges → extract_symbols → changed_symbols → assertion.
- tests/treesitter/fixtures/sample.{rs,py,ts,js,go,java} — hand-sized,
each contains two top-level functions, a class/struct with two
methods, and either a closure or lambda to verify nested expressions
don't pollute the captured set (closures/lambdas are different AST
nodes from function_item/function_definition and naturally don't
match the queries).
- tests/treesitter_integration.rs — six e2e tests (one per language)
asserting the exact expected symbol set and that changed_symbols
returns the expected subset when the diff intersects a single known
method.
- One non-gating perf signal (perf_parse_under_5ms_warm): warms once,
then prints elapsed via eprintln!. The <5ms assertion only fires
under HEW_TS_BENCH=1 to avoid flapping on slow CI.
Behavioral notes captured in the fixtures + assertions:
- Java requires class wrappers, so "top-level" surfaces as static
methods on the outer class.
- JS/TS method_definition nodes include the constructor; the expected
set lists it.
- Python's `__init__` is captured as a Method.
Verification:
- cargo test -p hew-core --features treesitter --test treesitter_integration:
7 passed.
- cargo test -p hew-core (default): all 356 tests pass — integration
tests gated out.
- cargo clippy --all-targets [--features treesitter] -- -D warnings: clean.
- cargo fmt --all -- --check: clean.
- All three GOTCHA:test-counts-drift install tests unchanged.
Refs: hew-npt, epic hew-sb7. Closes out the TS.1→TS.4 chain.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…w-6eb) The context-usage segment was computing pct against a 200K ceiling unconditionally until observed usage exceeded 200K, which made the bar pessimistically read e.g. 22% at 45K tokens on Opus 4.7 [1m] (actually ~4.5% of the real 1M window). The model id in the transcript reliably carries a `[1m]` suffix when the extended window is selected (confirmed against Opus 4.6 and 4.7). Use that as the authoritative signal: - read_token_usage now also picks up message.model from the same JSONL record where usage lives. - infer_context_limit(used, model): if the model id contains `[1m]`, return 1_000_000 directly. Otherwise fall back to the existing observed-usage heuristic (>200K → promote to 1M). Tests: - infer_context_limit_1m_suffix_wins_at_low_usage — 45K on a [1m] model resolves to 1M ceiling (the regression case). - infer_context_limit_no_suffix_stays_200k — non-[1m] models stay on the 200K heuristic at low usage. - infer_context_limit_thresholds_no_model — original threshold contract preserved when model is unknown. Refs: hew-6eb, epic hew-sb7. Shipped alongside the TS.1-TS.4 chain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Walks the paths given on argv, detects language by extension, runs extract_symbols, and prints the recovered symbols. Gated via required-features so default `cargo test` builds skip it. Usage: cargo run -p hew-core --features treesitter --example treesitter_scan -- <path>... Confirmed against this repo's Rust sources (statusline.rs, install.rs): Class on structs/enums/mod, Method on impl methods, Function on free fns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Headline consumer of the hew_core::treesitter library landed in TS.1-TS.4.
Walks `git diff --unified=0 <base>...HEAD`, parses each touched file's
hunk headers, runs extract_symbols + changed_symbols, and prints the
symbols whose definitions overlap a diff hunk.
- hew_core::diff_hunks: pure parser for `@@ -... +<start>[,<count>] @@`
headers → Vec<Range<u32>>. Skips zero-count pure-deletion hunks
(no new-side lines to scope). 7 unit tests inline.
- hew/src/commands/blast.rs: feature-gated subcommand. Args:
--base <ref> # default: main, falls back to master
--path <substr>... # restrict scan to matching paths
--json # JSON output
- hew/Cargo.toml: new `treesitter` feature forwards to
hew-core/treesitter. Default builds compile blast as a stub that
errors with a "rebuild with --features treesitter" hint, so the
4-way CI matrix and binary cold-build budget stay untouched.
Dogfooded against this very branch — produces 13 files, 128 changed
symbols, correctly identifying TokenUsage::total as Method, blast.rs::run
as a per-feature pair (cfg-gated forms collapse to one entry per
function body), etc.
Verification:
- cargo test (default): 356 passed.
- cargo test --features treesitter: 393 passed (7 hunk-parser tests
+ 30 grammar + 7 e2e + everything else).
- cargo clippy --all-targets [--features treesitter] -- -D warnings:
clean both paths.
- cargo fmt --all -- --check: clean.
- cargo run -p hew -- blast (no feature) → exit 1 with hint message.
- cargo run -p hew --features treesitter -- blast → real symbol list.
- Three GOTCHA:test-counts-drift install tests unchanged.
Out of scope (deferred):
- Wire into hew-review (scope review to changed symbols).
- Wire into hew-execute (attach symbol delta to close note).
- Index extracted symbols into the memory store.
Refs: hew-0xp, epic hew-sb7 (reopened).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(hew-7yh) Three additional ways to feed files to `hew blast`. - Positional file args: `hew blast path1 path2 ...` intersect with the git diff set. - --no-diff: skip git entirely. Extract every symbol from each positional / stdin file. Disables --base. - --stdin: read newline-separated paths from stdin, combinable with positional args. Refactor: run() body extracted three helpers (scan_file, git_diff_file_set, per_file_diff_ranges) so the three modes share the same per-file pipeline. Tests: path_matches edge cases + scan_file in both --no-diff and scoped modes against tempfile fixtures. All 4 new tests + 393 existing pass on both feature paths. Refs: hew-7yh, epic hew-sb7. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surface hew blast through the slash-command + skill system. Pure
surface change — no logic.
- commands/blast.md: thin /hew:blast wrapper. Documents the three input
modes (default diff, scoped diff, --no-diff) and the most useful
flags inline.
- skills/optional/hew-blast.md: opt-in skill body. Lists when the
agent should run it, what the JSON output looks like, what to do +
not do with the result, and the gating against the treesitter
build feature. Notes the future integration points (BL.3 close-note
attach, BL.4 review scoping) so the skill body is ready for them.
- hew_core::skills::OPTIONAL: register hew-blast.
- hew_core::slash::ALL: register the blast slash command.
- Bump the three GOTCHA:test-counts-drift hardcoded counts:
ship_one_index_plus_twenty_skills → ship_one_index_plus_twenty_one_skills (21)
OPTIONAL.len() → 9 (was 8)
ships_thirty_nine_commands → ships_forty_commands (40)
install_claude_writes_every_skill_and_slash_command → 63 (was 61)
install_codex_writes_per_skill_toml_and_agents_md → 43 (was 41)
Verification:
- cargo test (default): 363 passed (+7 from drift bump).
- cargo test --features treesitter: green.
- cargo clippy --all-targets [--features treesitter] -- -D warnings: clean.
- cargo fmt --all -- --check: clean.
Refs: hew-5z0, epic hew-sb7.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…w-x27)
Factor the blast pipeline into hew_core so the close path can call it
without re-execing the binary, and add an opt-in flag that auto-
appends a symbol-level changelog to a task's notes when it closes.
- hew_core::blast (new, feature-gated on treesitter):
- FileEntry, compute_blast(base), scan_file(file, ranges),
resolve_base(git, given), diff_file_set, per_file_diff_ranges,
format_note(base, entries).
- All errors flow through HewError. Per-file failures are swallowed
silently to match the existing CLI behavior.
- hew/src/commands/blast.rs: refactored to a thin CLI wrapper over
hew_core::blast. ~150 lines of duplicated logic gone.
- hew_core::config::CraftConfig: new `symbol_trace: bool` field
(default false). Wired into get/set/keys list with both snake and
kebab key forms (craft.symbol_trace / craft.symbol-trace).
- hew/src/commands/task.rs::close: after a successful close, call
maybe_attach_symbol_trace. When craft.symbol_trace=true AND the
binary was built with treesitter, runs compute_blast against the
branch base, formats a "symbols changed (blast vs <base>): ..."
block, and appends via `bd update --append-notes`. Silent best-
effort — the close has already succeeded, so any failure on this
side is dropped. Compiles out entirely under default features.
Verification:
- cargo test (default): all 363 tests pass; new symbol_trace key
flows through config get/set tests on the existing infrastructure.
- cargo test --features treesitter: all pass (393+ depending on
count after the BL.2 drift bump).
- cargo clippy --all-targets [--features treesitter] -- -D warnings:
clean.
- cargo fmt --all -- --check: clean.
- Drift counts unchanged (no new skill/slash files).
Out of scope:
- Hooking into hew-execute close auto-attach happens implicitly
via this same code path — hew-execute calls `hew task close`,
which now respects the flag.
Refs: hew-x27, epic hew-sb7.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…2gt) Augment hew review's JSON payload with per-symbol slices so the review skill body can read just the changed regions instead of the whole file. Token savings: 60-90% on reviews of large files with small diffs. - hew_core::blast: refactor resolve_base / diff_file_set / per_file_diff_ranges / compute_blast to accept &dyn GitClient instead of &RealGit, so library callers holding their own (or a mock) GitClient can drive the pipeline. compute_blast() remains a zero-arg convenience wrapping a fresh RealGit::discover. - hew_core::blast::collect_for_review(git, base): flattens compute results into Vec<ChangedSymbolForReview>, one row per symbol with source_slice attached. Returns empty Vec on any error so a misbehaving git can't break the bundle (caller still has the full diff as fallback). - hew_core::review::ReviewBundle: new optional `changed_symbols` field (skip_serializing_if = Vec::is_empty). The repr type ChangedSymbolForReviewRepr is defined alongside the bundle so the JSON schema is stable across feature combos. Populated by a feature-gated `collect_changed_symbols` helper that calls into the blast module under treesitter and returns [] otherwise. - review tests' mock GitClient: unhandled commands return HewError::GitNonZero instead of panicking. The bundle's changed-symbols enrichment runs against the mock under --features treesitter, calls rev-parse (unhandled), gets an Err, and gracefully falls back to an empty Vec — which is what the existing tests already expect. - skills/optional/hew-review.md: guidance update — "Read these slices first; widen to the full file via the `diff` field only when the slice's context is insufficient." Verification: - cargo test (default): 363 passed, changed_symbols field absent from serialized output (skip_serializing_if). - cargo test --features treesitter: green (review tests now flow through the enriched path). - cargo clippy --all-targets [--features treesitter] -- -D warnings: clean. - cargo fmt --all -- --check: clean. Refs: hew-2gt, epic hew-sb7. Closes out the BL chain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Workspace version bump, skill marker bump, refreshed .claude/ artifacts to match, and a CHANGELOG entry covering the TS.1-TS.4 + BL.1-BL.4 epic plus the statusline 1M context fix. Per the release process in CLAUDE.md: 1. Workspace Cargo.toml: version = "0.8.0". 2. Every skills/* and commands/* `<!-- hew:version=… -->` marker bumped from 0.7.1 to 0.8.0 (22 files). 3. .claude/skills + .claude/commands re-laid via `hew init --runtime=claude` so the dogfooded install matches the new version. /hew:blast and hew-blast.md now ship in the install plan. 4. CHANGELOG.md: 0.8.0 entry summarizing the treesitter library, the `hew blast` CLI + slash + skill, the craft.symbol_trace close-note attach, the review bundle's changed_symbols field, and the statusline [1m] suffix fix. Tests + clippy + fmt clean on both feature paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
hew_core::treesitter— feature-gated pure library that parses Rust / Python / TypeScript / JavaScript / Go / Java sources via tree-sitter and extractsSymbol { name, kind, byte_range, line_range }. Hand-trimmedtags.scmqueries per language. Default builds compile zero tree-sitter crates.hew blast— new subcommand: walksgit diff --unified=0 <base>...HEADand lists which symbols (functions / classes / methods) the branch actually touched. Three input modes: default diff scan, positional file intersection, and--no-diffskip-git for arbitrary files (combines with--stdin). Dogfooded against this very branch — 13 files, 128 changed symbols./hew:blastslash +hew-blastoptional skill body;craft.symbol_trace=trueauto-appends a symbol-level changelog to task notes on close;hew review bundle --jsonnow shipschanged_symbolsslices so review skills can read just the changed regions instead of whole files.[1m]model-id suffix is now authoritative for the 1M context window (fixes the bar reading 22% at 45K tokens on Opus 4.7)..claude/refreshed via dogfoodedhew init, CHANGELOG entry added.Epic structure (hew-sb7)
87d7bff1aa0988a9454bd2c2c551300c33dd25fd35be2d70c730b4ca519186f370b3ba9bd76acTest plan
--features treesitterbuild (if matrix covers it) — 393+ tests.cargo build -p hew-core --features treesittercold time 8.21s (budget 20s).cargo run -p hew --features treesitter -- blastproduces sensible output against this branch (128 symbols across 13 files).cargo run -p hew -- blast(no feature) exits 1 with rebuild hint.[0.8.0]section added..claude/artifacts refreshed and committed.After merge: tag
v0.8.0on main; cargo-dist will publish the binaries + homebrew formula.🤖 Generated with Claude Code