Skip to content

fix: preserve exact text for unquoted string values#62

Merged
shreyasbhat0 merged 3 commits intomainfrom
fix/unquoted-string-parsing
Mar 30, 2026
Merged

fix: preserve exact text for unquoted string values#62
shreyasbhat0 merged 3 commits intomainfrom
fix/unquoted-string-parsing

Conversation

@shreyasbhat0
Copy link
Copy Markdown
Member

@shreyasbhat0 shreyasbhat0 commented Mar 30, 2026

Summary

  • Fix scanner to track original token text (last_token_text) and exact whitespace counts (last_whitespace_count) during tokenization
  • Rewrite parse_tabular_field_value() to read complete cell text before type inference, per spec §B.3/§B.4
  • Fix parse_field_value() and parse_value_with_depth() to use original token text and exact spacing

This aligns the parser with the spec's approach: get complete value text first, then type-infer — rather than eagerly tokenizing then reassembling.

Bugs Fixed

Test plan

  • All 154 lib tests pass (including 1 corrected assertion + 4 new test functions)
  • All 78 integration/doc tests pass
  • Spec fixture tests pass
  • Manual smoke test of each reported reproduction case

The scanner was eagerly tokenizing values (breaking on spaces, parsing
numbers) instead of treating them as complete text before type inference,
as the spec requires (§B.3, §B.4, §B.5).

This caused three bugs:
- #59: multiple spaces collapsed (`a   b` → `a b`)
- #60: mixed-type tokens errored (`1 null`, `a 1` in tabular rows)
- #61: number formatting lost (`1.0 b` → `1 b`, `1e1 b` → `10 b`)

Scanner changes:
- Track `last_whitespace_count` and `last_token_text` through scanning
- `read_rest_of_line_with_space_info()` returns exact space count
- Add `read_until_delimiter_with_space_info()` for tabular cells

Parser changes:
- `parse_field_value()`: use original token text and exact space count
- `parse_tabular_field_value()`: read complete cell text then type-infer
- `parse_value_with_depth()`: handle all value token types in root-level
  concatenation with exact spacing

Fixes #59, #60, #61
Replace loop+match with while-let patterns and remove unnecessary
let binding in scan_token.
@shreyasbhat0 shreyasbhat0 merged commit 1217ed6 into main Mar 30, 2026
3 checks passed
@shreyasbhat0 shreyasbhat0 deleted the fix/unquoted-string-parsing branch March 30, 2026 08:18
@github-actions github-actions bot mentioned this pull request Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant