spec: generic syntax-highlight definition mechanism (#9955)#10129
spec: generic syntax-highlight definition mechanism (#9955)#10129lonexreb wants to merge 1 commit intowarpdotdev:masterfrom
Conversation
Adds product.md + tech.md for issue warpdotdev#9955: a contributor-friendly mechanism for adding new languages to Warp's syntax highlighting without modifying compiled Rust code and without releasing Warp. Investigation: today, adding a language requires changes in 5+ places in crates/languages/src/lib.rs (SUPPORTED_LANGUAGES array, language_by_filename match, to_arborium_name match, get_arborium_highlight_query match, plus a grammars/<lang>/ folder). The closed registry blocks the most-requested kind of community contribution: "I use $LANG and would happily contribute the highlighting definition." That contribution today requires touching the internal arborium crate dependency and shipping a Warp release. Spec proposes: - Three-source discovery (compile-time hardcoded, bundled directory, user-local directory) with explicit priority order. Hardcoded > bundled > user-local. Staged migration: V1 adds the discovery layer beside the hardcoded path; existing 32 languages keep working unchanged. No flag day. - Schema-driven language.toml (display_name, internal_name, comment_prefix, indent_unit, file_associations [extensions, filenames, shebangs, aliases], brackets, parser, ts_abi). One contract a contributor learns; everything else is standard tree-sitter files. - WASM-only for runtime-loaded user grammars; native dynamic libraries (.so/.dylib/.dll) explicitly rejected with a clear error message and no dlopen call exists on the loader path. - Validation with clear failure modes: a malformed grammar does NOT break Warp startup; surfaces via log + Settings page notification. Other grammars load normally. - Settings > Editor > Languages page lists all loaded grammars, their source, and any failures. - Tree-sitter substrate preserved (rejects switching to TextMate/Sublime regex grammars referenced in the issue as community-distribution exemplars only, not as recommended tech). - Per-language migration template for the 32 existing languages, each as an independently revertable PR. Test plan covers five schema-validation unit tests, four loader unit tests (including ABI mismatch and collision dedup), and three integration tests with a real tree-sitter grammar fixture. Six risks called out (WASM perf cost, schema versioning, capture- name standard set, ABI mismatch detection, sub-language injection, theme integration) with concrete TECH decisions or recommended follow-ups for each. Four open questions for maintainer review on tree-sitter version prerequisites, XDG path fallback, per-language disable in V1, and inventory-style auto-registration for bundled parsers.
|
I'm starting a first review of this spec-only pull request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
The spec defines a tree-sitter-based registry for bundled and user-local syntax highlighting definitions, with a product contract, loader architecture, migration plan, and tests. The direction addresses the linked issue, but several requirements contradict each other or leave security-critical implementation details unresolved.
Concerns
- The goal promises no compiled Rust changes/no release for both bundled and user-local paths, while the bundled path still ships in Warp and the tech spec requires Cargo and parser-map changes.
- The migration plan alternates between a single mechanical migration PR and independently revertable per-language migrations.
- The loader/API model does not cleanly represent failed grammar loads, yet the product requires failed grammars to appear in diagnostics.
- The product allows missing
highlights.scm, but the tech loader rejects highlight-query load failures.
Security
- User-local WASM is treated as sufficient sandboxing without specifying resource limits or host capabilities for untrusted parsers loaded at startup.
- Failure diagnostics log and display full grammar directory paths even though the telemetry section identifies paths as PII.
Verdict
Found: 0 critical, 6 important, 1 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
| ## Goal | ||
|
|
||
| A contributor can add a new language to Warp's syntax highlighting | ||
| **without modifying compiled Rust code and without releasing Warp**, |
There was a problem hiding this comment.
| match statements are replaced with a registry-driven lookup. The | ||
| existing 32 languages get their associations migrated from the | ||
| match statements to per-language `language.toml` files in a | ||
| single mechanical PR (this spec calls out that PR as a follow-up, |
There was a problem hiding this comment.
| libraries (`.so`, `.dylib`, `.dll`) are explicitly rejected and | ||
| never loaded. The WASM is loaded via tree-sitter's existing WASM | ||
| runtime. WASM provides the sandboxing that makes user-local | ||
| grammars safe. |
There was a problem hiding this comment.
| A grammar directory that fails to load (malformed `language.toml`, | ||
| WASM that fails to instantiate, `highlights.scm` that fails to | ||
| parse against the grammar) does NOT break Warp startup. Instead: | ||
| - A `log::error!` fires with the directory path and the failure |
There was a problem hiding this comment.
|
|
||
| # User-local grammars: WASM file path relative to the grammar dir. | ||
| # Mutually exclusive with [parser.rust_crate]. | ||
| wasm = "grammar.wasm" |
There was a problem hiding this comment.
💡 [SUGGESTION] Split this schema block into separate bundled and user-local examples; as written, the canonical example sets both rust_crate and wasm even though the comments say they are mutually exclusive.
| pub struct LoadedLanguage { | ||
| pub language: Arc<Language>, | ||
| pub source: LanguageSource, | ||
| pub failure: Option<LoadFailure>, |
There was a problem hiding this comment.
LoadedLanguage cannot represent a grammar that fails before a Language is constructed because language is mandatory, yet failed grammars must be returned and listed in Settings. Define a separate LoadResult/FailedGrammar variant before implementation.
| Reject if `WasmStore` reports an ABI mismatch with the host's | ||
| `tree_sitter::TREE_SITTER_LANGUAGE_VERSION`. | ||
| 3. Compile `highlights.scm` against the resolved grammar via | ||
| `Query::new`. On failure: record `LoadFailure`, return. |
There was a problem hiding this comment.
highlights.scm still loads without coloring, but this loader step treats highlight-query failure as LoadFailure and returns. Specify missing-file handling separately from invalid-query handling.
Summary
Spec for issue #9955 — adding a new language to Warp's syntax highlighting today requires changes in 5+ places in
crates/languages/src/lib.rs(SUPPORTED_LANGUAGESarray,language_by_filename,to_arborium_name,get_arborium_highlight_query, plus agrammars/<lang>/folder), all of which require modifying compiled Rust code and shipping a Warp release. This blocks the most-requested kind of community contribution: "I use $LANG and would happily contribute the highlighting definition."Investigation
SUPPORTED_LANGUAGES: [&str; 32](lib.rs:23).language_by_filename(lib.rs:115),to_arborium_name(lib.rs:226),get_arborium_highlight_query(lib.rs:239).arborium::tree_sitter::{Language, Query}(lib.rs:7) — tree-sitter is the right substrate; this spec preserves it.crates/languages/grammars/<lang>/already exist withconfig.yaml,identifiers.scm,indents.scm— embedded viaRustEmbed.arboriumsupport OR vendoring a tree-sitter grammar.What's in the spec
product.md— 8 testable behavior invariants (B1–B8), 7 acceptance criteria (A1–A7), explicit non-goals (no TextMate-style regex grammars, no native dynamic-library loading, no sub-language injection in V1, no hot-reload), and 6 risks with concrete TECH decisions.tech.md— picks the three-source discovery architecture, thelanguage.tomlschema, the loader, the file-association index, and the staged migration strategy. 5+4 unit tests plus 3 integration tests with a real tree-sitter fixture.Architectural choices
language.toml— one contract a contributor learns; everything else is standard tree-sitter files (highlights.scm,indents.scm, optionalidentifiers.scm)..so/.dylib/.dll) explicitly rejected with a clear error and nodlopencall exists on the loader path. Bundled grammars can be either WASM or a Cargo dependency on a Rust grammar crate.log::error!+Settings → Editor → Languagesnotification. Other grammars load normally.bundled_parsers.rscompile-time map is the only hand-edited file for adding bundled grammars (with an open question about whether to useinventory-style auto-registration to eliminate even that).Test plan
LoadFailurenot panic, collision dedup, ABI mismatch reporting)crates/languages/src/lib_tests.rsandcrates/syntax_tree/src/queries/*_tests.rspass unchanged~/.warp/grammars/zig/directory withlanguage.toml,highlights.scm,grammar.wasm; restart;.zigfiles rendergrammar.so; verify clear error, nodlopenOpen questions for maintainer review
WasmStore. Verify against currentCargo.lock.$XDG_CONFIG_HOME/warp/grammars/(when set) vs~/.warp/grammars/(fallback). Confirm precedence.bundled_parsers.rsuse aninventory-style auto-registration pattern to eliminate the only remaining hand-edit? Adds a build-time crate dep but removes the last manual step.Closes (spec-only) #9955 — implementation PR will follow once spec direction is confirmed. The 32 existing languages will migrate via independent follow-up PRs, one per language.