Skip to content

feat: add nested table syntax for uniform nested objects#293

Open
Turtle-dev3 wants to merge 7 commits intotoon-format:mainfrom
Turtle-dev3:feat/type-hints-and-nested-tables
Open

feat: add nested table syntax for uniform nested objects#293
Turtle-dev3 wants to merge 7 commits intotoon-format:mainfrom
Turtle-dev3:feat/type-hints-and-nested-tables

Conversation

@Turtle-dev3
Copy link
Copy Markdown

Summary

Adds opt-in nested table encoding that flattens uniform nested objects into TOON's tabular format, significantly reducing token count for data with consistent nested structure.

Before (falls back to list items):

orders[2]:
  - id: 1
    customer:
      name: Alice
      country: DK
    total: 99
  - id: 2
    customer:
      name: Bob
      country: UK
    total: 149

After (nestedTables: true):

orders[2]{id,customer{name,country},total}:
  1,Alice,DK,99
  2,Bob,UK,149

How it works

  • Encoder: Detects when all rows in a tabular array share identical nested object keys with primitive values. Flattens them into inline field{sub1,sub2} header syntax with values inlined in rows.
  • Decoder: Parses field{sub1,sub2} in headers and reconstructs nested objects from flattened row values. Also strips type hint suffixes (e.g., field:int) for forward compatibility.
  • Fallback: Non-uniform nested objects (different keys per row) or non-primitive nested values automatically fall back to the existing list syntax.

Benchmark results

Tested across 7 datasets including a new uniform-nested dataset (500 shipment records with sender/receiver/dimensions objects):

Dataset TOON TOON + Nested JSON Compact Savings vs TOON Savings vs JSON
uniform-nested 58,701 27,111 46,697 53.8% 41.9%
nested-config 620 591 558 4.7%
tabular (no nesting) 49,919 49,919 79,059 0% (identical) 36.9%

For data without nested objects, output is identical to standard TOON — zero overhead.

Breaking changes

None. The feature is fully opt-in via encode(data, { nestedTables: true }). Default encoder output is unchanged. The decoder handles nested table syntax automatically with no option needed.

Test plan

  • All 466 existing tests pass unchanged (zero regressions)
  • 9 new tests covering:
    • Encoder: flattening, fallback for non-uniform, disabled by default, multiple nested fields
    • Decoder: reconstruction, multiple nested fields, forward-compat type hint stripping
    • Round-trip: lossless encode → decode, non-uniform fallback
  • Full test suite passes (475 toon + 88 cli = 563 total)
  • Benchmark script runs against all 7 datasets

Add opt-in nestedTables option that flattens uniform nested objects
into tabular format: {id,customer{name,country},total}

Rows are inlined instead of falling back to list items, saving tokens
when data has consistent nested structure. Non-uniform objects fall
back to the existing list syntax automatically.
Parse nested field syntax in headers: {id,customer{name,country},total}
and reconstruct nested objects from flattened row values.

Also strips type hint suffixes (e.g., field:int) for forward
compatibility with future encoders that may add them.
9 tests covering encode, decode, round-trip, non-uniform fallback,
multiple nested fields, and forward-compatible type hint stripping.
Compare TOON vs TOON+nested vs JSON across 7 datasets including a new
uniform-nested dataset (500 shipments with sender/receiver/dimensions).

Key result: 53.8% token savings vs baseline TOON and 41.9% fewer
tokens than JSON on uniform nested data.
@asriva404
Copy link
Copy Markdown

Hi @Turtle-dev3, They are looking into another aproach for handling this. Please refer #290
toon-format/spec#32 (comment)

@Turtle-dev3
Copy link
Copy Markdown
Author

Hi @Turtle-dev3, They are looking into another aproach for handling this. Please refer #290 toon-format/spec#32 (comment)

Hi @asriva404 , thanks for the pointer!

I have looked into #290 and I believe the two proposals solve different problems. My PR extends the existing array table syntax ([N]{fields}:) to support nested sub-objects within rows, so uniform nested fields get flattened inline instead of falling back to list items.

#290 introduces an entirely new container type for keyed object collections ({fields}*:), where the parent is an object with string keys rather than an array.

So this PR targets nested objects inside arrays, while #290 targets table-collapse for dictionaries/maps. They could actually complement each other quite nicely if both were to land.

Happy to hear if the maintainers see it differently, though! (Or I have misunderstood something ^^)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants