Gemma4 tool calling and RoPE fixes#594
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes several Gemma4 correctness issues affecting rotary embeddings and tool calling, ensuring behavior matches reference implementations and that multimodal Gemma4 instances correctly apply tool constraints.
Changes:
- Correct Gemma4 partial RoPE to rotate interleaved halves rather than a contiguous prefix.
- Make tool-constraint APIs virtual on
Modeland forward them throughGemma4MmModel. - Fix tool JSON parsing cursor advancement so multiple tools in one payload are handled.
- Improve special-tokens map parsing by honoring JSON escape sequences via
extract_json_string.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| cactus/models/gemma4/model_gemma4_mm.cpp | Forwards tool-constraint methods to the underlying language model so multimodal decoding is constrained. |
| cactus/models/gemma4/model_gemma4.h | Declares Gemma4MmModel overrides for tool-constraint APIs. |
| cactus/models/gemma4/model_gemma4.cpp | Updates partial RoPE slicing/rotation to use interleaved-halves semantics. |
| cactus/ffi/cactus_utils.h | Fixes tool JSON parsing to continue searching after the parsed parameters block. |
| cactus/engine/engine_tokenizer.cpp | Uses extract_json_string to correctly parse escaped special-token strings. |
| cactus/engine/engine.h | Adds extract_json_string, makes tool-constraint APIs virtual, and extends ToolCallConstrainer API surface. |
Comments suppressed due to low confidence (1)
cactus/ffi/cactus_utils.h:910
parse_tools_jsonstill has uncheckedfind(...) + 1patterns (e.g.,size_t name_start = json.find('"', name_pos + 6) + 1;). Iffindreturnsnpos, the+ 1wraps to 0 and the subsequentsubstrcalls can parse unrelated parts of the payload. Sincetools_jsonis passed in from the FFI, please addnposchecks for each delimiter search (and ideally bound searches to the current tool object) so malformed inputs fail gracefully instead of producing corrupted tool specs.
size_t next_search = pos + 1;
size_t name_pos = json.find("\"name\"", pos);
if (name_pos != std::string::npos) {
size_t name_start = json.find('"', name_pos + 6) + 1;
size_t name_end = json.find('"', name_start);
tool.name = json.substr(name_start, name_end - name_start);
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (rot_dim < head_dim) { | ||
| size_t half_dim = head_dim / 2; | ||
| size_t half_rot = rot_dim / 2; | ||
| size_t pass_len = half_dim - half_rot; | ||
| float adjusted_theta = std::pow(rope_freq, static_cast<float>(rot_dim) / static_cast<float>(head_dim)); |
There was a problem hiding this comment.
apply_partial_rope can pass an odd or zero rot_dim into gb->rope(...) (since rot_dim is computed via float->size_t truncation and isn't clamped/aligned). The underlying RoPE kernel assumes an even head_dim and will leave the last element(s) uninitialized when head_dim is odd; for rot_dim==0 it will also build zero-length slices/concats unnecessarily. Consider clamping rot_dim to [0, head_dim], forcing it to be even (e.g., rot_dim &= ~1u), and early-returning when rot_dim==0 or rot_dim>=head_dim (similar to the partial-rope handling in model_qwen3p5.cpp).
| void compute_bias(); | ||
| void tokenize_grammar_elements(); | ||
| void add_tokens_for_string(const std::string& str, std::unordered_set<uint32_t>& token_set); | ||
| void add_tokens_containing(char needle, std::unordered_set<uint32_t>& token_set); |
There was a problem hiding this comment.
ToolCallConstrainer declares add_tokens_containing(...) here, but there is no corresponding definition in the codebase (searching the repository only finds this declaration). If this is intended to be used, it needs an implementation (or should be removed until it’s needed) to avoid leaving dead/incomplete API surface in engine.h.
| void add_tokens_containing(char needle, std::unordered_set<uint32_t>& token_set); |
Signed-off-by: Noah Cylich <noahcylich@gmail.com>
Signed-off-by: Noah Cylich <noahcylich@gmail.com>
This reverts commit 4ab77bb. Signed-off-by: Noah Cylich <noahcylich@gmail.com>
Signed-off-by: Noah Cylich <noahcylich@gmail.com>
Summary
iwithi + head_dim/2) instead of a contiguous prefix, matching the reference implementation.set_tool_constraints/clear_tool_constraints/update_tool_constraintsvirtual on the base model and forward them throughGemma4MmModelso multimodal instances actually apply constraints."function"search cursor past the parsedparametersblock so multiple tools in a single payload are no longer dropped or re-parsed.extract_json_stringhelper (previouslyrfind("\"")could pull in trailing quoted content).Test plan