Skip to content

release: merge develop into main (language, runtime, LSP, docs, and tooling updates)#42

Open
nedanwr wants to merge 260 commits into
mainfrom
develop
Open

release: merge develop into main (language, runtime, LSP, docs, and tooling updates)#42
nedanwr wants to merge 260 commits into
mainfrom
develop

Conversation

@nedanwr

@nedanwr nedanwr commented Mar 12, 2026

Copy link
Copy Markdown
Owner

Summary

Merge develop into main and ship the accumulated 0.5.x feature stream.

Changes

  • Language/runtime features added:
    • ranges, maps, file-based modules
    • modulo and unary negation
    • mutable collection operations
    • try/catch/finally + typed exceptions
    • file I/O + path utilities
    • regex module
    • type conversion helpers and type system enhancements
    • f-string format specifiers
    • lambdas, multiple return values
    • default parameters, variadic functions, named arguments
    • list utilities, list comprehensions, pattern matching
    • debug statement support
  • LSP and editor experience improvements:
    • modular LSP architecture refactor
    • stronger diagnostics (argument/type checks, divide/modulo by zero, map key deletion checks, module import/circular diagnostics, file I/O validation)
    • richer IDE features (completion, hover, definitions/references, rename, code actions, formatting, workspace symbols, code lens, signature help, inlay hints, semantic tokens, call hierarchy, folding)
  • Runtime/dev tooling improvements:
    • REPL expression statement output
    • line editing/history/completion via linenoise
    • CLI -e execution support
    • VM/compiler/core memory and stability hardening
    • script reorganization under scripts/
    • Docker dev and Valgrind helper scripts
  • Documentation/website:
    • official docs website and interactive WASM playground
    • expanded docs content and API/language references
    • legacy markdown docs consolidated/migrated

Testing

  • Includes substantial unit, integration, and LSP test additions/updates in develop

  • Merge introduces no extra changes beyond branch sync

  • All existing tests pass

  • New tests added (if applicable)

  • Examples updated (if applicable)

  • Memory leak checks pass (if applicable)

Documentation

  • README, ROADMAP, CHANGELOG, and website docs updated

  • Examples expanded for new language/runtime features

  • Examples updated (if applicable)

  • README/docs updated (if applicable)

Breaking Changes

  • Contributor-facing paths changed (e.g., run_tests.sh moved to scripts/run_tests.sh, docs migrated to website structure).
  • No intentional end-user language breakage expected; this release is primarily additive plus bug fixes.

Summary by CodeRabbit

  • New Features

    • Lambdas, pattern matching, list comprehensions, tuples, multi-value returns/unpacking, default/named/variadic params, debug statements, rich f-string format specs, many new built-ins (math, strings, I/O, regex, list utilities).
  • Editor Features

    • LSP: signature help, inlay hints, semantic tokens, call hierarchy, folding ranges, improved completions and hover.
  • Documentation

    • Expanded guides, examples, quick‑reference, README, roadmap, and language docs.
  • Tests

    • Extensive new unit, integration, and LSP tests covering above features.

## Summary

Add Python-like format specifiers to f-strings, enabling precision,
width, alignment, and type control for formatted output.

## Changes

- Add `format_specs` parallel array to f-string AST structure for
storing format specifiers
- Enhance parser to extract format specifiers after `:` in f-string
expressions (e.g., `{price:.2f}`)
- Add `OP_FORMAT_VALUE` opcode to compiler for emitting formatted value
instructions
- Implement format spec parsing and value formatting in VM runtime with
support for:
  - Precision: `.2f`, `.4f`, `.0f`
  - Width: `:10`, `:<10`, `:>10`, `:^10`
  - Fill characters: `:0>5`, `:->5`, `:*^9`
  - Type specifiers: `:d`, `:f`, `:s`
- Add format specifier completion snippets to LSP
- Add comprehensive documentation to website

## Testing

- [x] All existing tests pass
- [x] New tests added (if applicable)
- [x] Examples updated (if applicable)
- [x] Memory leak checks pass (if applicable)

New test files:
- `fstrings_format_precision.kr`
- `fstrings_format_width.kr`
- `fstrings_format_fill.kr`
- `fstrings_format_combined.kr`

All 133 integration tests pass. Valgrind reports no memory leaks.

## Documentation

- [x] Examples updated (if applicable)
- [x] README/docs updated (if applicable)

Updated:
- `website/content/docs/language/strings.mdx` - Full format specifier
documentation
- `website/content/docs/quick-reference.mdx` - Quick reference example
- `website/content/docs/roadmap.mdx` - Added feature to v0.5.0 section

## Breaking Changes

None


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added f-string format specifiers enabling precision, width, alignment,
fill characters, and type control (e.g., `f"{price:.2f}"`)

* **Documentation**
  * Added comprehensive format specifier documentation with examples
  * Updated roadmap with string interpolation enhancements

* **Tests**
  * Added integration tests covering f-string formatting scenarios

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
…ction calls and undefined variables in lambda bodies
## Summary

Add support for anonymous function expressions (lambdas) that can be
assigned to variables and called dynamically.

## Changes

- Add lambda expression parsing in frontend (`parser.c/h`) with new AST
node types
- Implement lambda compilation with new opcodes `OP_CREATE_FUNCTION` and
`OP_INVOKE_FUNCTION` (`compiler.c/h`)
- Extend VM to handle lambda function creation and invocation (`vm.c/h`)
- Add `owned_bytecode` field for proper lambda memory management
- Extend `value_new_function` to support parameter names (`runtime.c/h`)
- Update LSP diagnostics to check lambda bodies for undefined variables
- Update LSP completion descriptions to include lambda syntax

## Testing

- [x] All existing tests pass
- [x] New tests added (if applicable)
- `tests/integration/pass/lambda_basic.kr` - basic lambda assignment and
calls
- `tests/integration/pass/lambda_multiline.kr` - multi-line lambdas with
block bodies
- [x] Examples updated (if applicable)
- [x] Memory leak checks pass (if applicable)

## Documentation

- [x] Examples updated (if applicable)
- [x] README/docs updated (if applicable)
- `website/content/docs/language/functions.mdx` - lambda syntax and use
cases
  - `website/content/docs/quick-reference.mdx` - quick reference entry
  - `website/content/docs/roadmap.mdx` - marked as completed

## Breaking Changes

None

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/core/gc.c (2)

609-638: ⚠️ Potential issue | 🔴 Critical

Missing VAL_TUPLE handling in mark phase causes incorrect cycle collection.

gc_mark_reachable() handles VAL_LIST and VAL_MAP by recursively marking contained items, but VAL_TUPLE falls through to the default case. This means tuple items won't be marked as reachable, causing them to be incorrectly identified as cycles and freed—leading to use-after-free when the tuple is accessed.

🐛 Proposed fix: Add VAL_TUPLE to mark phase
   case VAL_LIST:
     // Mark all items in the list
     if (val->as.list.items) {
       for (size_t i = 0; i < val->as.list.count; i++) {
         if (val->as.list.items[i]) {
           gc_mark_reachable(val->as.list.items[i], marked, capacity, entries);
         }
       }
     }
     break;
+  case VAL_TUPLE:
+    // Mark all items in the tuple
+    if (val->as.tuple.items) {
+      for (size_t i = 0; i < val->as.tuple.count; i++) {
+        if (val->as.tuple.items[i]) {
+          gc_mark_reachable(val->as.tuple.items[i], marked, capacity, entries);
+        }
+      }
+    }
+    break;
   case VAL_MAP: {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/gc.c` around lines 609 - 638, The mark phase in gc_mark_reachable
misses VAL_TUPLE handling causing tuple elements not to be marked; update the
switch in gc_mark_reachable to add a case for VAL_TUPLE that mirrors VAL_LIST
behavior by iterating over val->as.tuple.items (or the tuple field name used in
your KronosValue union) and calling gc_mark_reachable on each non-NULL item,
similarly to how VAL_LIST and VAL_MAP are handled; reference the
gc_mark_reachable function and the VAL_TUPLE enum tag so the tuple branch is
included alongside VAL_LIST/VAL_MAP to prevent tuples being incorrectly
collected.

733-756: ⚠️ Potential issue | 🟡 Minor

Memory accounting in sweep phase also missing VAL_TUPLE.

The finalization at lines 784-786 is correct, but the memory accounting update in the sweep phase (lines 733-756) doesn't handle VAL_TUPLE, causing allocated_bytes to remain inconsistent after cycle collection.

🔧 Proposed fix: Add VAL_TUPLE to sweep memory accounting
           case VAL_FUNCTION:
             if (obj->as.function.bytecode && obj->as.function.length > 0) {
               gc_state.allocated_bytes -= obj->as.function.length;
             }
             break;
+          case VAL_TUPLE:
+            if (obj->as.tuple.count > 0) {
+              gc_state.allocated_bytes -=
+                  obj->as.tuple.count * sizeof(KronosValue *);
+            }
+            break;
           default:
             break;

Also applies to: 784-786

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/gc.c` around lines 733 - 756, The sweep-phase memory accounting
misses handling for VAL_TUPLE, so update the switch on obj->type in the sweep
logic (the block referencing obj->type and gc_state.allocated_bytes) to subtract
the tuple's allocated size similar to the finalizer path; add a case for
VAL_TUPLE that decreases gc_state.allocated_bytes by the tuple's capacity/length
allocation (use the same fields accessed in the finalization code for tuples) so
allocated_bytes remains consistent after collection.
src/lsp/lsp_definition.c (1)

827-845: ⚠️ Potential issue | 🟠 Major

Rename still rewrites inside single-quoted strings.

This scanner only toggles in_string on ". The rest of the LSP code in this PR already treats ' as a string delimiter too, so a rename can still modify 'name' literals.

Proposed fix
   bool in_string = false;
+  char string_quote = '\0';
   bool in_comment = false;
@@
-    if (text[i] == '"' && (i == 0 || text[i - 1] != '\\')) {
-      in_string = !in_string;
+    if ((text[i] == '"' || text[i] == '\'') &&
+        (i == 0 || text[i - 1] != '\\')) {
+      if (!in_string) {
+        in_string = true;
+        string_quote = text[i];
+      } else if (text[i] == string_quote) {
+        in_string = false;
+        string_quote = '\0';
+      }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lsp/lsp_definition.c` around lines 827 - 845, The scanner currently only
toggles in_string when encountering double quotes, so single-quoted literals (')
are not recognized and renames can alter them; update the string-detection logic
in the loop that uses in_string/in_comment (in lsp_definition.c) to treat both
double-quote (") and single-quote (') as string delimiters, respecting escapes
(i.e., only toggle in_string when the quote is not escaped) and ensuring
toggling is ignored while in_comment is true; adjust the condition that flips
in_string (currently checking text[i] == '"' and previous char != '\\') to
handle both quote characters and preserve existing behavior for escaped quotes.
src/lsp/lsp_hover.c (1)

83-136: ⚠️ Potential issue | 🟠 Major

These hover buffers can still emit truncated JSON.

The new parameter-by-parameter rendering makes hover payload size unbounded, but both branches still build it in 1-2 KB stack buffers. Once json_escape_markdown() or the final snprintf() truncates, the client gets a clipped JSON response instead of a large hover.

Also applies to: 228-344

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lsp/lsp_hover.c` around lines 83 - 136, The hover construction in
lsp_hover.c (buffers hover_text, escaped_hover, and result, and functions
json_escape_markdown and snprintf) can produce truncated JSON because the code
writes unbounded content into fixed 1–2KB stack buffers; change this to build
the hover payload on the heap and detect/handle overflows: allocate dynamic
buffers sized from snprintf(..., NULL, 0) or grow with realloc as you append (or
use asprintf-like helpers), ensure json_escape_markdown writes into a sized heap
buffer and returns needed length, and if any formatting/escaping would exceed a
safe limit return an LSP error or empty/partial response rather than emitting
clipped JSON; apply the same fix to the other hover-generation block mentioned
(lines ~228–344) so both branches avoid silent truncation.
♻️ Duplicate comments (3)
src/compiler/compiler.c (1)

1117-1133: ⚠️ Potential issue | 🟠 Major

Guard spec_idx before encoding it as uint16_t.

add_constant() can still return an index above 65535 here. Without the same bound check used elsewhere, emit_uint16() truncates the format-spec constant and the f-string will load the wrong entry at runtime.

🛠️ Minimal fix
     size_t spec_idx = add_constant(c, spec_val);
     if (spec_idx == SIZE_MAX) {
       return;
     }
+    if (spec_idx > UINT16_MAX) {
+      compiler_set_error(c, "Too many constants (limit 65535)");
+      return;
+    }
     emit_byte(c, OP_FORMAT_VALUE);
     if (compiler_has_error(c)) {
       return;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/compiler/compiler.c` around lines 1117 - 1133, In emit_expr_to_string,
guard the spec_idx returned by add_constant() before calling emit_uint16: after
checking for SIZE_MAX, verify spec_idx <= UINT16_MAX (or 65535) and if it
exceeds that bound call compiler_set_error(c, "Too many constants; format
specifier index out of range") (or reuse the existing constant-pool overflow
error message pattern) and return; this prevents truncation when calling
emit_uint16 and matches the same bound checks used elsewhere for constants.
src/lsp/lsp_definition.c (1)

461-471: ⚠️ Potential issue | 🟠 Major

Comprehension bindings are still resolved at the wrong source position.

reference_matches_symbol() uses get_node_position(node), but this branch passes the whole AST_LIST_COMPREHENSION node for the loop variable. A shadowed name like [item for item in xs] is still compared at the outer expression position, so references/rename can bleed into the outer item.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lsp/lsp_definition.c` around lines 461 - 471, The comprehension loop
variable is being matched using the whole AST_LIST_COMPREHENSION node so
shadowed names get compared at the outer expression position; in
search_node_for_references_recursive when handling AST_LIST_COMPREHENSION, use
the actual loop variable node (node->as.list_comprehension.var) when calling
reference_matches_symbol and when computing the source location (i.e. derive
position/column from that var node via get_node_position or the same helper used
elsewhere) before calling add_reference_location so the reference/rename targets
the variable's real source span instead of the outer comprehension node.
src/lsp/lsp_diagnostics.c (1)

93-143: ⚠️ Potential issue | 🟠 Major

Normalize resolved import paths before using them as stack keys.

The stack is keyed by raw resolved_path text, but lsp_resolve_module_path() still preserves ./ and ../ segments. The same file can therefore enter the stack under different strings, so circular imports are still missed when equivalent paths are spelled differently.

Also applies to: 255-270

🧹 Nitpick comments (1)
tests/lsp/test_lsp_features.c (1)

134-147: Test depends on external fixture file.

This test imports from tests/integration/fail/circular_a.kr, which must exist and contain a circular import pattern. Ensure this fixture file is committed and the test documentation notes this dependency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/lsp/test_lsp_features.c` around lines 134 - 147, The test
TEST(lsp_diagnostics_circular_import_reported) relies on an external fixture
that provides a circular import (module named "circular_a"), but that fixture
file is missing from the repo; add and commit the fixture file that
exports/imports to produce the circular import used by the test, and update test
documentation (or the test README) to note the dependency so CI knows the file
is required; ensure the fixture contents reproduce the circular import pattern
referenced by the test so lsp_did_open and lsp_read_diagnostics_with_message
produce the expected diagnostic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ROADMAP.md`:
- Around line 319-399: The roadmap adds plans for 0.6+/1.0 but the document
header still says "Current Version: 0.4.5" and shows "0.5.0 in progress"; update
the roadmap header to reflect the 0.5.x merge by changing the Current Version to
0.5.0 (or 0.5.x) and set the status appropriately (e.g., "Released" or
"Merged"), and mirror the same status/version change in the later section
referenced (around lines 440-465) so the top-of-file version and in-file
references are consistent with the PR merge.

In `@src/compiler/compiler.c`:
- Around line 3990-4022: bytecode_print() is still using the old
OP_MAKE_FUNCTION layout and skips the new fields, causing disassembly to go out
of sync for functions with optional/variadic params; update the OP_MAKE_FUNCTION
case in bytecode_print (the code that reads param_count and the parameter-name
loop) to also decode the new fields emitted by the compiler: after the
parameter-name table read required_param_count and has_variadic flags (the same
widths the compiler emits), then read the default-value constants sequence
(decode the same constant entries/count encoding the compiler uses) before
reading body_len (uint16) and advancing by body_len; ensure you advance offset
for each of these fields exactly as the compiler does so the printed names,
defaults, and body_len stay in sync.

In `@src/core/gc.c`:
- Around line 384-386: gc_track() and gc_untrack() currently omit VAL_TUPLE from
memory accounting, so allocated_bytes undercounts tuple memory; add logic
parallel to the VAL_LIST case to account for tuple storage by increasing
allocated_bytes in gc_track() and decreasing it in gc_untrack() using the
tuple's byte size (based on obj->as.tuple.count and sizeof or the same size
calculation used for lists) and reference the VAL_TUPLE tag and
obj->as.tuple.items when locating where to insert the changes.

In `@src/lsp/lsp_diagnostics.c`:
- Around line 1516-1537: The AST_MATCH branch in lsp_diagnostics.c never visits
the matched expression (node->as.match_stmt.value), so calls inside the match
subject are skipped; add a null-checked call to check_expression for
node->as.match_stmt.value (using the same parameters passed to case_patterns:
text, symbols, ast, diagnostics, pos, remaining, has_diagnostics, NULL, 0,
capacity) before iterating case_patterns/case_blocks so the call-arity walker
sees calls in the match subject; leave the existing check_function_calls usage
for case/default blocks unchanged.
- Around line 193-242: The diagnostics path is doing synchronous disk I/O and
full reparse via lsp_parse_ast_from_file called from check_diagnostics (invoked
by handle_did_open/handle_did_change), which blocks the LSP loop; fix by
removing direct fopen/tokenize/parse on the main thread: change
check_diagnostics to use an in-memory source or cached ASTs (populate/update a
workspace buffer/AST cache when handle_did_open/handle_did_change receive the
document text) and have lsp_parse_ast_from_file either accept a source buffer
parameter or be replaced by a background task that parses from the cached
buffer; alternatively offload parse work to a worker thread or task queue so
lsp_parse_ast_from_file (or its replacement) does not perform blocking file I/O
on the didChange diagnostics path.
- Around line 363-380: The AST_VAR branch in infer_type_internal currently calls
find_symbol(node->as.var_name), which can pick up an outer/shadowed symbol;
change the lookup to resolve the symbol by source position/scope instead of
name-only (e.g. call an existing resolver like
find_symbol_at_position(node->as.var_name, node->start_pos) or add a new helper
find_symbol_in_scope(symbols, node->as.var_name, node->start_pos)) and use that
result for applying annotations and inference; update the same pattern wherever
find_symbol(...) is used for AST_VAR (also at the other occurrences mentioned)
so type inference uses the symbol bound at the node's source position before
falling back to assignment inference via find_variable_assignment and recursive
infer_type_internal.
- Around line 2561-2625: The diagnostic uses the old binding index because
occurrence is taken from seen_vars[k].assignment_count before it is incremented;
update occurrence to reflect the new assignment (e.g. set occurrence =
seen_vars[k].assignment_count + 1 or increment assignment_count before capturing
occurrence) so the immutable-reassignment message points at the current unpack
target. Change this in the loop that touches seen_vars[k].assignment_count (the
block using seen_vars, occurrence, assignment_count and node->as.unpack_assign)
and ensure the rest of the logic (find_nth_occurrence, diagnostic range) uses
the updated occurrence.

In `@src/lsp/lsp_handlers.c`:
- Around line 170-171: The current fixed allocation using formatted_capacity =
text_len * 2 and writing up to formatted_capacity-1 can truncate output; update
the formatting logic around formatted_capacity, formatted and the write loop to
grow the buffer dynamically (realloc) when nearing capacity instead of stopping,
or compute a safe upper bound before allocation; ensure all write paths expand
formatted_capacity and adjust formatted pointer, maintain formatted_pos and
null-terminate the buffer, and apply the same dynamic-resize fix to the other
formatting block referenced in lines 195-246.
- Around line 2096-2114: The current folding pass uses fixed-size arrays
line_starts and line_lengths (4096) and stops collecting lines when line_count
reaches that limit, silently dropping the rest; change the collector to handle
arbitrarily many lines by replacing the fixed arrays with dynamically resizable
storage (e.g., allocate arrays and grow them with realloc or use a dynamic
container) and update all uses of line_starts/line_lengths/line_count
accordingly (ensure you preserve the existing behavior when the document is
small, properly handle allocation failures, and update any loop conditions that
check line_count). Locate the code referencing line_starts, line_lengths, and
line_count in the folding logic and implement resizing (or a two-pass
count-then-alloc approach) so folding ranges cover the entire document instead
of truncating after 4096 lines.
- Around line 1305-1329: The code currently strips module prefixes via
strip_module_prefix(callee_raw) and stores the stripped name in
edges[edge_count].callee (and uses normalized elsewhere), which collapses
qualified names like "math.sqrt" into "sqrt"; change the logic to preserve the
full qualified callee_raw when populating CallEdge (i.e., assign/strncpy
edges[].callee from callee_raw, set edges[].length = strlen(callee_raw)), and
remove or avoid calling strip_module_prefix in these call-edge construction
sites (including the occurrences around symbols shown: strip_module_prefix,
normalized, edges array population and similar blocks elsewhere) so that
qualified calls remain distinct from local symbols. Ensure buffer bounds checks
remain (use sizeof(edges[...].callee)-1) and update any subsequent
comparisons/resolutions that currently expect normalized names to use the
preserved qualified name instead.

In `@src/lsp/lsp_utils.c`:
- Around line 2133-2141: The early return when range_end <= pos incorrectly
stops searching; change the behavior in the block that checks "if (range_end <=
pos)" to skip this match and continue scanning (like the existing
"current_length == 0" branch) instead of returning false. Update the control
flow around range_end, pos, and current_length so that when a trimmed match
yields range_end <= pos the code does "continue" and lets the loop try
subsequent matches, keeping the rest of the logic in the function intact.

In `@src/vm/vm_builtins.c`:
- Around line 265-281: The error paths after vm_execute (in the callback
execution code) clean up the call frame but forget to restore the VM value stack
to saved_stack_top, leaving intermediate values on vm->stack for subsequent
operations; modify both failure branches (the blocks handling exec_status != 0
around vm_execute) to pop/unwind values down to saved_stack_top (e.g., set
vm->stack_top = saved_stack_top or repeatedly pop until vm->stack_top ==
saved_stack_top) before restoring vm->current_frame, vm->bytecode, vm->ip and
returning exec_status, while preserving the existing call-frame cleanup
(cleanup_call_frame_locals, free(callback_frame->owned_bytecode), decrement
vm->call_stack_size).
- Around line 2205-2266: The dirname/basename logic currently chops Windows
drive roots (e.g. "C:\\") down to "C:", so detect a leading drive-letter root
(isalpha(path[0]) && path[1] == ':') and when computing the directory result
(around variables last_sep/dir_len and the value_new_string(path, dir_len) call)
include the trailing path separator if present so the returned string becomes
"C:\\" (length dir_len+1) rather than "C:". Apply the same detection and
adjustment in the corresponding basename code paths (the other block noted in
the review) so both dirname and basename preserve Windows drive roots; keep
using is_path_separator and value_new_string/value_release as in the existing
code.

---

Outside diff comments:
In `@src/core/gc.c`:
- Around line 609-638: The mark phase in gc_mark_reachable misses VAL_TUPLE
handling causing tuple elements not to be marked; update the switch in
gc_mark_reachable to add a case for VAL_TUPLE that mirrors VAL_LIST behavior by
iterating over val->as.tuple.items (or the tuple field name used in your
KronosValue union) and calling gc_mark_reachable on each non-NULL item,
similarly to how VAL_LIST and VAL_MAP are handled; reference the
gc_mark_reachable function and the VAL_TUPLE enum tag so the tuple branch is
included alongside VAL_LIST/VAL_MAP to prevent tuples being incorrectly
collected.
- Around line 733-756: The sweep-phase memory accounting misses handling for
VAL_TUPLE, so update the switch on obj->type in the sweep logic (the block
referencing obj->type and gc_state.allocated_bytes) to subtract the tuple's
allocated size similar to the finalizer path; add a case for VAL_TUPLE that
decreases gc_state.allocated_bytes by the tuple's capacity/length allocation
(use the same fields accessed in the finalization code for tuples) so
allocated_bytes remains consistent after collection.

In `@src/lsp/lsp_definition.c`:
- Around line 827-845: The scanner currently only toggles in_string when
encountering double quotes, so single-quoted literals (') are not recognized and
renames can alter them; update the string-detection logic in the loop that uses
in_string/in_comment (in lsp_definition.c) to treat both double-quote (") and
single-quote (') as string delimiters, respecting escapes (i.e., only toggle
in_string when the quote is not escaped) and ensuring toggling is ignored while
in_comment is true; adjust the condition that flips in_string (currently
checking text[i] == '"' and previous char != '\\') to handle both quote
characters and preserve existing behavior for escaped quotes.

In `@src/lsp/lsp_hover.c`:
- Around line 83-136: The hover construction in lsp_hover.c (buffers hover_text,
escaped_hover, and result, and functions json_escape_markdown and snprintf) can
produce truncated JSON because the code writes unbounded content into fixed
1–2KB stack buffers; change this to build the hover payload on the heap and
detect/handle overflows: allocate dynamic buffers sized from snprintf(..., NULL,
0) or grow with realloc as you append (or use asprintf-like helpers), ensure
json_escape_markdown writes into a sized heap buffer and returns needed length,
and if any formatting/escaping would exceed a safe limit return an LSP error or
empty/partial response rather than emitting clipped JSON; apply the same fix to
the other hover-generation block mentioned (lines ~228–344) so both branches
avoid silent truncation.

---

Duplicate comments:
In `@src/compiler/compiler.c`:
- Around line 1117-1133: In emit_expr_to_string, guard the spec_idx returned by
add_constant() before calling emit_uint16: after checking for SIZE_MAX, verify
spec_idx <= UINT16_MAX (or 65535) and if it exceeds that bound call
compiler_set_error(c, "Too many constants; format specifier index out of range")
(or reuse the existing constant-pool overflow error message pattern) and return;
this prevents truncation when calling emit_uint16 and matches the same bound
checks used elsewhere for constants.

In `@src/lsp/lsp_definition.c`:
- Around line 461-471: The comprehension loop variable is being matched using
the whole AST_LIST_COMPREHENSION node so shadowed names get compared at the
outer expression position; in search_node_for_references_recursive when handling
AST_LIST_COMPREHENSION, use the actual loop variable node
(node->as.list_comprehension.var) when calling reference_matches_symbol and when
computing the source location (i.e. derive position/column from that var node
via get_node_position or the same helper used elsewhere) before calling
add_reference_location so the reference/rename targets the variable's real
source span instead of the outer comprehension node.

---

Nitpick comments:
In `@tests/lsp/test_lsp_features.c`:
- Around line 134-147: The test TEST(lsp_diagnostics_circular_import_reported)
relies on an external fixture that provides a circular import (module named
"circular_a"), but that fixture file is missing from the repo; add and commit
the fixture file that exports/imports to produce the circular import used by the
test, and update test documentation (or the test README) to note the dependency
so CI knows the file is required; ensure the fixture contents reproduce the
circular import pattern referenced by the test so lsp_did_open and
lsp_read_diagnostics_with_message produce the expected diagnostic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0a0702e4-0607-4729-88f0-663e5cab53ed

📥 Commits

Reviewing files that changed from the base of the PR and between 0a0be2f and e1dc3b6.

📒 Files selected for processing (13)
  • README.md
  • ROADMAP.md
  • src/compiler/compiler.c
  • src/core/gc.c
  • src/lsp/lsp.h
  • src/lsp/lsp_definition.c
  • src/lsp/lsp_diagnostics.c
  • src/lsp/lsp_handlers.c
  • src/lsp/lsp_hover.c
  • src/lsp/lsp_utils.c
  • src/vm/vm_builtins.c
  • tests/lsp/test_lsp_features.c
  • website/content/docs/roadmap.mdx
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lsp/lsp.h

Comment thread ROADMAP.md
Comment on lines +319 to +399
### Version 0.6.0: "Core Runtime Beta"

**Target:** N/A
**Target:** Q2 2026
**Status:** 📋 Planned

#### Planned Features

- **Concurrency** - Goroutines and channels (Go-inspired)

```kronos
spawn task with:
for i in range 1 to 10:
print "Task:", i

set ch to channel
spawn sender with ch:
send ch, "hello"
set msg to receive ch
```

- **Complete Standard Library** - 50+ functions across multiple modules
#### Scope

- **Core Standard Library Completion (Beta)** - Stabilize the modules required for 1.0 language adoption
- **Math:** `sin()`, `cos()`, `tan()`, `asin()`, `acos()`, `atan()`, `log()`, `log10()`, `exp()`, `cbrt()`
- **String:** `find()`, `rfind()`, `count()`, `capitalize()`, `title()`
- **Date/Time:** `now()`, `format_date()`, `parse_date()`, `sleep()`
- **Collections:** `zip()`, `enumerate()`, `any()`, `all()`, `sum()`
- **System:** `exit()`, `args()`, `env()`
- **JSON:** `parse_json()`, `to_json()`
- **CSV:** `read_csv()`, `write_csv()`, `parse_csv()`, `to_csv()`
- **Test Framework:** `assert`, `test`, `run_tests`
- **Crypto:** `md5()`, `sha1()`, `sha256()`, `sha512()`, `random_bytes()`, `secure_random_int()`
- **HTTP:** `http_get()`, `http_post()`, HTTP server (requires concurrency)
- **CLI:** `parse_args()`, argument parsing utilities
- **Modules in scope:** `math`, `string`, `os`, `json`, `time`, `collections`, `regex`

- **Method Chaining** - Fluent API support
- **Method Chaining (Beta)** - Fluent API support with parser/runtime hardening

```kronos
set result to text.uppercase().trim().split(" ")
set processed to list 1, 2, 3.filter(function with x: return x is greater than 2).map(function with x: return x times 2)
```

- **Performance Optimizations**

- **Baseline Optimizations (Beta)** - Safe optimizations targeted for 1.0
- Bytecode optimization passes
- Constant folding
- Dead code elimination
- Inline caching for method calls
- Profile-guided optimization
- **F-string Expression Parsing Optimization** - Parse embedded expressions inline
- Currently f-strings re-tokenize embedded expressions, which is inefficient
- Optimize by tracking source positions in tokens and parsing inline
- Requires architectural changes: position tracking, substring parsing capability
- Will eliminate redundant tokenization for f-string expressions

- **Standard Library Modules**
---

- `math` - Complete mathematical functions
- `string` - String utilities
- `os` - Operating system interface
- `json` - JSON parsing and generation
- `csv` - CSV parsing and generation
- `test` - Testing framework
- `crypto` - Cryptographic functions
- `http` - HTTP client and server
- `cli` - Command-line argument parsing
- `time` - Time and date operations
- `collections` - Collection utilities
- `regex` - Regular expressions
### Version 0.7.0: "Tooling & LSP Beta"

- **Package Manager** - Install and manage packages
**Target:** Q3 2026
**Status:** 📋 Planned

```bash
kronos install <package>
kronos list
kronos remove <package>
```
#### Scope

- **Code Formatter** - Opinionated formatter (like `gofmt`/`rustfmt`)
- **Code Formatter (Beta)** - Opinionated formatter (like `gofmt`/`rustfmt`)

```bash
kronos format <file>
kronos format --check <file>
```

- **Linter** - Code quality checks
- **Linter (Beta)** - Code quality checks

```bash
kronos lint <file>
kronos lint --fix <file>
```

- **Test Runner** - Built-in testing framework
- **Test Runner & Test Module (Beta)** - Built-in testing workflow and module APIs

```bash
kronos test
kronos test --verbose
```

- **Documentation Generator** - Generate docs from code
- **Documentation Generator (Beta)** - Generate docs from code

```bash
kronos doc
kronos doc --output docs/
```

- **LSP Complete** - Full language server protocol support
- **LSP Complete (Beta)** - Full language server protocol support
- All previous LSP features
- Workspace symbols
- Call hierarchy
- Multi-root workspace support
- Production-ready performance
- Production-ready performance targets

- **CLI Utilities (Beta)** - `parse_args()` and argument parsing utilities used by tooling

---

### Version 0.8.0: "Concurrency & Ecosystem Beta"

**Target:** Q4 2026
**Status:** 📋 Planned

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update the roadmap’s release state in the same change.

This PR is explicitly the 0.5.x release merge, but this file only adds 0.6+/1.0 planning while the document still reads as Current Version: 0.4.5 with 0.5.0 in progress. Please bump the current-version/status text too, otherwise the repo docs contradict what main is supposed to ship.

Also applies to: 440-465

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ROADMAP.md` around lines 319 - 399, The roadmap adds plans for 0.6+/1.0 but
the document header still says "Current Version: 0.4.5" and shows "0.5.0 in
progress"; update the roadmap header to reflect the 0.5.x merge by changing the
Current Version to 0.5.0 (or 0.5.x) and set the status appropriately (e.g.,
"Released" or "Merged"), and mirror the same status/version change in the later
section referenced (around lines 440-465) so the top-of-file version and in-file
references are consistent with the PR merge.

Comment thread src/compiler/compiler.c
Comment thread src/compiler/compiler.c
Comment thread src/core/gc.c
Comment thread src/lsp/lsp_diagnostics.c Outdated
Comment thread src/lsp/lsp_handlers.c Outdated
Comment thread src/lsp/lsp_handlers.c Outdated
Comment thread src/lsp/lsp_utils.c
Comment thread src/vm/vm_builtins.c
Comment thread src/vm/vm_builtins.c Outdated
nedanwr added 13 commits March 14, 2026 02:22
…mline variable symbol resolution in diagnostics
…mically allocating line storage and ensuring proper cleanup to prevent memory leaks
…y handle drive root paths in string value creation
…dynamic buffer resizing to prevent overflow and ensure proper allocation
…ving unnecessary module prefix check and using raw callee directly
…etadata to ensure compliance with limits and prevent invalid configurations

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/compiler/compiler.c (1)

3118-3173: ⚠️ Potential issue | 🟡 Minor

OP_DEFINE_FUNC disassembly is out of sync with emitted metadata.

You now emit required_param_count, has_variadic, and default-constant indices, but bytecode_print() at Line 3813-Line 3832 still uses the old skip formula. This desynchronizes instruction decoding after function definitions using defaults/variadics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/compiler/compiler.c` around lines 3118 - 3173, The OP_DEFINE_FUNC emitter
now writes required_param_count, has_variadic, and per-parameter name and
default-value constant indices, but bytecode_print's OP_DEFINE_FUNC disassembly
still uses the old skip logic; update bytecode_print (the OP_DEFINE_FUNC case)
to read and display/skip the same sequence the emitter writes: first read
required_param_count (uint8), then has_variadic (uint8), then consume
param_count constant indices for parameter names (using the function's
param_count/regular_param_count information accessible in the disassembler),
compute num_defaults = regular_param_count - required_param_count (subtracting
the variadic slot if has_variadic), and then consume that many constant indices
for default values; adjust printed output to include required_param_count and
has_variadic and ensure the total bytecode cursor advance matches this new
sequence so disassembly stays in sync with emit_define_func.
src/lsp/lsp_utils.c (1)

1369-1389: ⚠️ Potential issue | 🔴 Critical

Guard declaration-name dereferences for partial AST nodes.

At Line 1374, Line 1388, Line 1424, Line 1437, and Line 1534/1545, strcmp/strdup assume declaration names are non-null. In LSP/incremental parse flows, incomplete nodes can have missing names and crash symbol-table construction.

💡 Suggested hardening
 case AST_ASSIGN: {
+  if (!node->as.assign.name) {
+    break;
+  }
   Symbol *existing = head ? *head : NULL;
   while (existing) {
     if (existing->name &&
         strcmp(existing->name, node->as.assign.name) == 0 &&
         existing->type == SYMBOL_VARIABLE) {
@@
 case AST_UNPACK_ASSIGN: {
   for (size_t j = 0; j < node->as.unpack_assign.name_count; j++) {
+    const char *unpack_name =
+        (node->as.unpack_assign.names ? node->as.unpack_assign.names[j] : NULL);
+    if (!unpack_name) {
+      continue;
+    }
     Symbol *existing = head ? *head : NULL;
     while (existing) {
       if (existing->name &&
-          strcmp(existing->name, node->as.unpack_assign.names[j]) == 0 &&
+          strcmp(existing->name, unpack_name) == 0 &&
           existing->type == SYMBOL_VARIABLE) {
@@
-      sym->name = strdup(node->as.unpack_assign.names[j]);
+      sym->name = strdup(unpack_name);
@@
 case AST_TYPE_ALIAS: {
+  if (!node->as.type_alias.name) {
+    break;
+  }
   Symbol *existing = head ? *head : NULL;

Also applies to: 1417-1438, 1530-1546

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lsp/lsp_utils.c` around lines 1369 - 1389, The AST handling for
assignments and declarations assumes node->as.assign.name (and other declaration
name fields) are non-NULL and calls strcmp/strdup directly, which crashes for
partial/incremental AST nodes; update the AST_ASSIGN branch (and the other spots
noted around lines where strcmp/strdup are used) to first check that
node->as.assign.name (and the equivalent declaration-name fields) is non-NULL
before calling strcmp or strdup, and if NULL skip creating/updating the symbol
(e.g., continue the switch or treat as incomplete), ensuring
allocate_symbol()/sym->name assignment only happen when the name is present;
apply the same null-guard pattern around the other referenced uses to harden
symbol-table construction.
♻️ Duplicate comments (3)
src/compiler/compiler.c (1)

1117-1134: ⚠️ Potential issue | 🟠 Major

Guard spec_idx before narrowing to uint16_t.

At Line 1133, spec_idx is cast to uint16_t without checking spec_idx > UINT16_MAX, which can silently corrupt bytecode when the constant pool grows past 65535.

💡 Suggested fix
   size_t spec_idx = add_constant(c, spec_val);
   if (spec_idx == SIZE_MAX) {
     return;
   }
+  if (spec_idx > UINT16_MAX) {
+    compiler_set_error(c, "Too many constants (limit 65535)");
+    return;
+  }
   emit_byte(c, OP_FORMAT_VALUE);
   if (compiler_has_error(c)) {
     return;
   }
   emit_uint16(c, (uint16_t)spec_idx);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/compiler/compiler.c` around lines 1117 - 1134, The code in
emit_expr_to_string casts spec_idx to uint16_t before ensuring the constant pool
index fits, risking silent truncation when add_constant returns an index >
UINT16_MAX; check the returned spec_idx after add_constant(c, spec_val) and
before emit_uint16, and if spec_idx == SIZE_MAX (already handled) or spec_idx >
UINT16_MAX then call compiler_set_error(c, "Constant pool index exceeds 16-bit
limit") (or similar) and return; update the branch that emits OP_FORMAT_VALUE to
only call emit_uint16(c, (uint16_t)spec_idx) when spec_idx <= UINT16_MAX and
ensure you keep existing compiler_has_error(c) checks around
emit_byte/emit_uint16.
src/lsp/lsp_handlers.c (1)

1551-1553: ⚠️ Potential issue | 🟠 Major

Qualified call names are still collapsed during symbol resolution.

Using strip_module_prefix(...) in these lookup paths can bind math.sqrt/regex.match-style calls to unrelated local symbols with the same tail name, producing incorrect signature help/inlay/call-hierarchy results.

💡 Proposed fix
-  Symbol *function_sym = find_function_symbol(strip_module_prefix(function_name));
+  bool is_qualified_call = strchr(function_name, '.') != NULL;
+  Symbol *function_sym =
+      is_qualified_call ? NULL : find_function_symbol(function_name);
-  Symbol *sym = find_function_symbol(strip_module_prefix(function_name));
+  bool is_qualified_call = strchr(function_name, '.') != NULL;
+  Symbol *sym = is_qualified_call ? NULL : find_function_symbol(function_name);

Also applies to: 1828-1830, 1932-1934, 1972-1974, 2061-2063

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lsp/lsp_handlers.c` around lines 1551 - 1553, The lookup is incorrectly
collapsing qualified names by calling
find_function_symbol(strip_module_prefix(function_name)) which binds calls like
"math.sqrt" to unrelated locals; change the resolution to use the full qualified
name (call find_function_symbol(function_name)) and keep the builtin lookup
as-is (find_builtin_signature(function_name)); apply the same change to the
other occurrences referenced (the blocks around the symbols at the other
locations) so all symbol resolution paths stop stripping module prefixes.
src/lsp/lsp_utils.c (1)

1951-1995: ⚠️ Potential issue | 🟠 Major

find_call_position still matches call ... with inside quoted strings.

Unlike the newer helpers, this path only skips comment lines. A string like "call foo with 1" can still be treated as a real call site.

💡 Suggested fix (align with new helpers)
   const char *pos = text;
   while ((pos = strstr(pos, pattern)) != NULL) {
-    const char *before = pos;
-    while (before > text && *(before - 1) != '\n') {
-      before--;
+    const char *line_start = pos;
+    while (line_start > text && *(line_start - 1) != '\n') {
+      line_start--;
     }
+    const char *before = line_start;
@@
     if (before < pos && *before == '#') {
       pos += strlen(pattern);
       continue;
     }
+    if (is_match_inside_quoted_string(line_start, pos)) {
+      pos += strlen(pattern);
+      continue;
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lsp/lsp_utils.c` around lines 1951 - 1995, find_call_position currently
only skips comment lines and can falsely match "call <func> with" inside quoted
strings; update the match validation to detect and skip matches that occur
inside single- or double-quoted strings. Specifically, in the loop where you
have pos, before treating it as real, walk from text up to pos maintaining a
simple state machine: track whether you're inside a single-quote or double-quote
(toggle on unescaped ' or "), respect backslash escapes, and ignore matches if
in_string is true; keep existing comment-skip logic (before) and the rest of the
line/col computation unchanged. Use the same symbols (pos, before, pattern,
text) so the change is localized inside find_call_position.
🧹 Nitpick comments (2)
tests/unit/test_gc.c (1)

124-126: Make collision sample size depend on stats.array_capacity

Using a fixed VALUE_COUNT = 49 makes this test sensitive to future table-capacity changes. Deriving the sample size from stats.array_capacity would keep the collision assertion deterministic over time.

♻️ Suggested adjustment
-  enum { VALUE_COUNT = 49 };
-  KronosValue *values[VALUE_COUNT];
-  memset(values, 0, sizeof(values));
+  size_t value_count = stats.array_capacity + 1; // Pigeonhole guarantee
+  KronosValue **values = calloc(value_count, sizeof(*values));
+  ASSERT_PTR_NOT_NULL(values);

-  for (size_t i = 0; i < VALUE_COUNT; i++) {
+  for (size_t i = 0; i < value_count; i++) {
@@
-  for (size_t i = 0; i < VALUE_COUNT && second == SIZE_MAX; i++) {
+  for (size_t i = 0; i < value_count && second == SIZE_MAX; i++) {
@@
-  for (size_t i = 0; i < VALUE_COUNT; i++) {
+  for (size_t i = 0; i < value_count; i++) {
@@
+  free(values);

Also applies to: 135-146, 170-171

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_gc.c` around lines 124 - 126, Replace the hard-coded
VALUE_COUNT = 49 sample size with a value derived from the allocator/table
capacity by reading stats.array_capacity (e.g. set VALUE_COUNT =
stats.array_capacity or stats.array_capacity + N as needed) and use that
variable to size the KronosValue *values[] array, the memset, and all
loops/assertions that use VALUE_COUNT (references: VALUE_COUNT, values,
KronosValue). Update the test sections around the other occurrences (the blocks
covering the later loops/assertions) to compute the collision sample size from
stats.array_capacity so the collision assertion remains deterministic when table
capacity changes.
src/vm/vm_builtins.c (1)

1382-1393: Consider thread-safety and randomness quality for rand().

rand() uses global state and is not thread-safe. Additionally, its output quality varies by platform. For a VM that might be used in concurrent scenarios, consider:

  • Using rand_r() (POSIX) or platform-specific thread-safe alternatives
  • Or using a better PRNG like xoshiro256 for higher quality randomness

This is acceptable for simple use cases but worth documenting the limitations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/vm/vm_builtins.c` around lines 1382 - 1393, builtin_rand currently calls
the non-thread-safe global rand() and should be replaced with a thread-safe,
higher-quality PRNG; modify builtin_rand to use a thread-safe generator (e.g.,
accept or reference a per-VM RNG state, use rand_r(&seed) on POSIX, or plug in a
project-wide xoshiro256/xoroshiro implementation seeded per-VM or per-thread)
and ensure you seed that generator once (e.g., when creating the KronosVM) and
expose a small wrapper function (e.g., vm_rand_double or kronos_prng_next) that
returns a double in [0,1); update builtin_rand to call that wrapper instead of
rand(), and add portability guards and fallback to rand_r/rand where xoshiro is
unavailable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/compiler/compiler.c`:
- Around line 1276-1305: The compiler now emits a named_count byte followed by
named_count pairs of (arg_position byte, name_constant via emit_constant_index)
in compiler.c (see emit_byte, emit_constant_index, compiler_set_error), but the
disassembler in bytecode_print still assumes legacy layout and advances a fixed
4 bytes; update bytecode_print (and the other disassembly spot at the other
location) to first read named_count, then loop named_count times reading one
byte for the argument position (READ_BYTE/ equivalent) and then read the
constant index using the same logic/macros used for constants (the existing
READ_CONSTANT/READ_CONSTANT_INDEX helper) so the disassembly advances by the
exact number of bytes emitted for each name entry instead of a fixed 4 bytes.
- Around line 3027-3057: The function compile_default_value_to_constant
currently returns the raw results of value_new_*() without checking for
allocation failure; update compile_default_value_to_constant to check each
value_new_*() return and, if NULL, call compiler_set_error(c, "Out of memory
allocating default parameter value") (or similar OOM message) before returning
NULL so callers (those invoking compile_default_value_to_constant for default
parameter handling) won't proceed without an error set; ensure the unary-neg
branch also checks the value_new_number(-operand->as.number) result and sets the
same compiler error on NULL.

In `@src/core/gc.c`:
- Around line 755-760: gc_collect_cycles currently treats VAL_TUPLE during sweep
but gc_mark_reachable does not traverse tuple children, so tuple items remain
unmarked and can be freed while still referenced by the tuple; update
gc_mark_reachable to handle VAL_TUPLE by iterating obj->as.tuple.items for
obj->as.tuple.count entries and calling gc_mark_reachable (or the existing mark
function) on each non-null item so tuple children are marked as reachable, and
ensure any related accounting (e.g., the VAL_TUPLE case in gc_collect_cycles
that adjusts gc_state.allocated_bytes) remains consistent with this traversal.

In `@src/lsp/lsp_diagnostics.c`:
- Around line 266-316: The resolver returns non-canonical joined paths (keeping
"./", "../", redundant slashes) which breaks string-based cache and cycle
checks; update lsp_resolve_module_path to run the resolved path through a
canonicalization step before returning (e.g., implement a helper like
lsp_normalize_path that collapses "." and ".." segments, removes redundant
slashes, and preserves leading "/" for absolute paths), and call that helper for
every return path in lsp_resolve_module_path (and the analogous return site
around the 318-330 block) so callers and cache/cycle checks always see
normalized identity strings.
- Around line 438-444: The current early return for
LSP_MODULE_CACHE_STATE_MISSING causes missing-module state to be reported even
during non-blocking passes; change the branch so that when entry && entry->state
== LSP_MODULE_CACHE_STATE_MISSING you first check allow_disk_io and return
LSP_MODULE_LOAD_PENDING if allow_disk_io is false, otherwise return
LSP_MODULE_LOAD_MISSING—i.e., either move the allow_disk_io check above the
missing-state return or nest it so that missing -> pending on non-blocking
passes (references: entry, LSP_MODULE_CACHE_STATE_MISSING,
LSP_MODULE_LOAD_MISSING, allow_disk_io, LSP_MODULE_LOAD_PENDING).

In `@src/lsp/lsp_utils.c`:
- Around line 525-739: The function node_declares_loop_variable is currently
scope-insensitive and may report a name declared in a nested/other scope as the
same loop variable; update node_declares_loop_variable (and any callers like
is_loop_variable) to be scope-aware by adding a scope/context parameter (e.g., a
pointer or scope id) or a depth flag and refuse to descend into nodes that
introduce new symbol scopes (at minimum AST_FUNCTION, AST_LAMBDA,
AST_LIST_COMPREHENSION and other scope-introducing nodes) when searching for
declarations, so only declarations in the same scope as the target name are
considered; adjust recursive calls to pass the scope/context and ensure
loop/comprehension variable checks still occur for the immediate
loop/comprehension node but do not leak into unrelated function/lambda bodies or
nested scopes.

In `@src/vm/vm_builtins.c`:
- Around line 1194-1196: The replace implementation in function replace uses
strstr (seen where search_start and search_end are used and
old_str->as.string.data is searched), which fails for binary strings with
embedded NULs; change the search to a binary-safe search (use memmem if
available or implement a loop using memchr to find the first byte of old_str and
then memcmp to verify the full old_len) and update the bounds check to stop at
search_end - old_len + 1 (or otherwise ensure matches that would overflow are
not considered); keep using variables search_start, search_end,
old_str->as.string.data and old_len from vm_builtins.c so the new search
integrates with the existing replace loop.
- Around line 1034-1035: contains() currently calls strstr() which fails on
embedded NULs; replace it with a binary-safe search that uses the length fields
(str->as.string.length and substring->as.string.length) and memcmp() to compare
bytes: return true if substring length is zero, return false if substring length
> string length, otherwise slide over str->as.string.data up to (str.len -
sub.len) and use memcmp to test each candidate. Apply the same fix in replace()
(use the same length-aware search and memcmp comparisons when locating matches)
and ensure you reference the same data/length fields (str->as.string.data,
substring->as.string.data, etc.) rather than using strstr().
- Around line 2063-2079: The POSIX-only directory traversal in list_files (uses
opendir/readdir) must be made cross-platform like portable_fopen; wrap the
current POSIX block with `#ifndef` _WIN32 and add an `#ifdef` _WIN32 branch that
uses Windows APIs (MultiByteToWideChar to convert path, append "\\*" pattern,
call FindFirstFileW/FindNextFileW with WIN32_FIND_DATAW, create KronosValue
strings for each file using the same
value_new_string/value_new_list/value_release patterns, and call FindClose on
finish), and ensure errors use vm_errorf/vm_error similarly and resources are
released (value_release, FindClose) just as the POSIX path does.
- Around line 2413-2426: The regex builtins in vm_builtins.c rely on POSIX APIs
(regcomp, regexec, regfree and <regex.h>) which won't compile on native Windows;
update the implementation to be portable by adding conditional compilation: wrap
the current POSIX usage of regcomp/regexec/regfree in `#ifdef` HAVE_POSIX_REGEX
(or check for _WIN32) and provide an alternative Windows branch (either use
Windows regex APIs, a bundled cross-platform regex library, or return a clear
runtime error if regexes are unsupported), and ensure callers like the regex
compile/execute code paths release values (pattern_arg, string_arg) and
propagate errors consistently in both branches; reference the
functions/regcomp/regexec/regfree and the regex handling block in vm_builtins.c
when making these changes.

---

Outside diff comments:
In `@src/compiler/compiler.c`:
- Around line 3118-3173: The OP_DEFINE_FUNC emitter now writes
required_param_count, has_variadic, and per-parameter name and default-value
constant indices, but bytecode_print's OP_DEFINE_FUNC disassembly still uses the
old skip logic; update bytecode_print (the OP_DEFINE_FUNC case) to read and
display/skip the same sequence the emitter writes: first read
required_param_count (uint8), then has_variadic (uint8), then consume
param_count constant indices for parameter names (using the function's
param_count/regular_param_count information accessible in the disassembler),
compute num_defaults = regular_param_count - required_param_count (subtracting
the variadic slot if has_variadic), and then consume that many constant indices
for default values; adjust printed output to include required_param_count and
has_variadic and ensure the total bytecode cursor advance matches this new
sequence so disassembly stays in sync with emit_define_func.

In `@src/lsp/lsp_utils.c`:
- Around line 1369-1389: The AST handling for assignments and declarations
assumes node->as.assign.name (and other declaration name fields) are non-NULL
and calls strcmp/strdup directly, which crashes for partial/incremental AST
nodes; update the AST_ASSIGN branch (and the other spots noted around lines
where strcmp/strdup are used) to first check that node->as.assign.name (and the
equivalent declaration-name fields) is non-NULL before calling strcmp or strdup,
and if NULL skip creating/updating the symbol (e.g., continue the switch or
treat as incomplete), ensuring allocate_symbol()/sym->name assignment only
happen when the name is present; apply the same null-guard pattern around the
other referenced uses to harden symbol-table construction.

---

Duplicate comments:
In `@src/compiler/compiler.c`:
- Around line 1117-1134: The code in emit_expr_to_string casts spec_idx to
uint16_t before ensuring the constant pool index fits, risking silent truncation
when add_constant returns an index > UINT16_MAX; check the returned spec_idx
after add_constant(c, spec_val) and before emit_uint16, and if spec_idx ==
SIZE_MAX (already handled) or spec_idx > UINT16_MAX then call
compiler_set_error(c, "Constant pool index exceeds 16-bit limit") (or similar)
and return; update the branch that emits OP_FORMAT_VALUE to only call
emit_uint16(c, (uint16_t)spec_idx) when spec_idx <= UINT16_MAX and ensure you
keep existing compiler_has_error(c) checks around emit_byte/emit_uint16.

In `@src/lsp/lsp_handlers.c`:
- Around line 1551-1553: The lookup is incorrectly collapsing qualified names by
calling find_function_symbol(strip_module_prefix(function_name)) which binds
calls like "math.sqrt" to unrelated locals; change the resolution to use the
full qualified name (call find_function_symbol(function_name)) and keep the
builtin lookup as-is (find_builtin_signature(function_name)); apply the same
change to the other occurrences referenced (the blocks around the symbols at the
other locations) so all symbol resolution paths stop stripping module prefixes.

In `@src/lsp/lsp_utils.c`:
- Around line 1951-1995: find_call_position currently only skips comment lines
and can falsely match "call <func> with" inside quoted strings; update the match
validation to detect and skip matches that occur inside single- or double-quoted
strings. Specifically, in the loop where you have pos, before treating it as
real, walk from text up to pos maintaining a simple state machine: track whether
you're inside a single-quote or double-quote (toggle on unescaped ' or "),
respect backslash escapes, and ignore matches if in_string is true; keep
existing comment-skip logic (before) and the rest of the line/col computation
unchanged. Use the same symbols (pos, before, pattern, text) so the change is
localized inside find_call_position.

---

Nitpick comments:
In `@src/vm/vm_builtins.c`:
- Around line 1382-1393: builtin_rand currently calls the non-thread-safe global
rand() and should be replaced with a thread-safe, higher-quality PRNG; modify
builtin_rand to use a thread-safe generator (e.g., accept or reference a per-VM
RNG state, use rand_r(&seed) on POSIX, or plug in a project-wide
xoshiro256/xoroshiro implementation seeded per-VM or per-thread) and ensure you
seed that generator once (e.g., when creating the KronosVM) and expose a small
wrapper function (e.g., vm_rand_double or kronos_prng_next) that returns a
double in [0,1); update builtin_rand to call that wrapper instead of rand(), and
add portability guards and fallback to rand_r/rand where xoshiro is unavailable.

In `@tests/unit/test_gc.c`:
- Around line 124-126: Replace the hard-coded VALUE_COUNT = 49 sample size with
a value derived from the allocator/table capacity by reading
stats.array_capacity (e.g. set VALUE_COUNT = stats.array_capacity or
stats.array_capacity + N as needed) and use that variable to size the
KronosValue *values[] array, the memset, and all loops/assertions that use
VALUE_COUNT (references: VALUE_COUNT, values, KronosValue). Update the test
sections around the other occurrences (the blocks covering the later
loops/assertions) to compute the collision sample size from stats.array_capacity
so the collision assertion remains deterministic when table capacity changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9ff3ebf9-a061-4378-a0a7-6e7902d6ff98

📥 Commits

Reviewing files that changed from the base of the PR and between 3f635ac and 8852e7e.

📒 Files selected for processing (9)
  • src/compiler/compiler.c
  • src/core/gc.c
  • src/lsp/lsp.h
  • src/lsp/lsp_diagnostics.c
  • src/lsp/lsp_handlers.c
  • src/lsp/lsp_main.c
  • src/lsp/lsp_utils.c
  • src/vm/vm_builtins.c
  • tests/unit/test_gc.c

Comment thread src/compiler/compiler.c
Comment thread src/compiler/compiler.c
Comment thread src/core/gc.c
Comment thread src/lsp/lsp_diagnostics.c
Comment thread src/lsp/lsp_diagnostics.c
Comment thread src/lsp/lsp_utils.c
Comment thread src/vm/vm_builtins.c
Comment thread src/vm/vm_builtins.c
Comment thread src/vm/vm_builtins.c
Comment thread src/vm/vm_builtins.c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant