Skip to content

chore(parser)!: refactor JSON extract#7302

Open
geooo109 wants to merge 9 commits intomainfrom
geooo109/json_path_refactor
Open

chore(parser)!: refactor JSON extract#7302
geooo109 wants to merge 9 commits intomainfrom
geooo109/json_path_refactor

Conversation

@geooo109
Copy link
Collaborator

@geooo109 geooo109 commented Mar 16, 2026

Refactored the _parse_colon_as_variant_extract method.

I added extensive tests for sf and sf -> duckdb:

- v:a[0] — static index                                                                                                                                                                                   - v:a[0]:b — static index + path                                                                                                                                                                          
- v:a[s.x] — dynamic index
- v:c[s.x]:r — dynamic index + path                                                                                                                                                                       
- v:c[s.x]:r:d::varchar — dynamic index + path + cast      
- v:a:b[s.x] — nested path + dynamic index                                                                                                                                                                
- v:a.b[s.x].r.d — dot path + dynamic index + dot path                                                                                                                                                    
- v:a.b[s.x].r.d[s.y] — two dynamic indices with path between
- v:a[s.x][s.y] — chained dynamic indices
- v:a[s.x][0] — dynamic + static index
- v:c::variant[1] — cast + static index
- v:a::variant[0]::varchar — cast + static index + cast
- v:a.b[s.x].c::variant[0] — dynamic index + path + cast + static index
- v:a::variant[s.x] — cast + dynamic index
- col:"customer's department" — quoted key with special chars

/integration-tests

@geooo109 geooo109 force-pushed the geooo109/json_path_refactor branch from 6bc278e to 35e1763 Compare March 16, 2026 19:01
@geooo109 geooo109 marked this pull request as draft March 16, 2026 19:11
@github-actions
Copy link
Contributor

github-actions bot commented Mar 16, 2026

SQLGlot Integration Test Results

Comparing:

  • this branch (sqlglot:geooo109/json_path_refactor, sqlglot version: geooo109/json_path_refactor)
  • baseline (main, sqlglot version: 0.0.1.dev1)

By Dialect

dialect main sqlglot:geooo109/json_path_refactor transitions links
bigquery -> bigquery 23872/23877 passed (100.0%) 23872/23877 passed (100.0%) No change full result / delta
bigquery -> duckdb 2106/2624 passed (80.3%) 2106/2624 passed (80.3%) No change full result / delta
duckdb -> duckdb 4003/4004 passed (100.0%) 4003/4004 passed (100.0%) No change full result / delta
snowflake -> duckdb 1508/2674 passed (56.4%) 1508/2674 passed (56.4%) No change full result / delta
snowflake -> snowflake 65910/65910 passed (100.0%) 65878/65910 passed (100.0%) 32 pass -> fail full result / delta
databricks -> databricks 1364/1364 passed (100.0%) 1364/1364 passed (100.0%) No change full result / delta
postgres -> postgres 6040/6040 passed (100.0%) 6040/6040 passed (100.0%) No change full result / delta
redshift -> redshift 7059/7059 passed (100.0%) 7059/7059 passed (100.0%) No change full result / delta

Overall

main: 113552 total, 111862 passed (pass rate: 98.5%), sqlglot version: 0.0.1.dev1

sqlglot:geooo109/json_path_refactor: 113552 total, 111830 passed (pass rate: 98.5%), sqlglot version: geooo109/json_path_refactor

Transitions:
32 pass -> fail

@geooo109 geooo109 marked this pull request as ready for review March 17, 2026 10:45
@geooo109 geooo109 force-pushed the geooo109/json_path_refactor branch from 7c54312 to 383d216 Compare March 17, 2026 11:35
@geooo109
Copy link
Collaborator Author

Will solve the tests and re-push

break
self.raise_error("Expected key name after '.'")
elif self._match(TokenType.L_BRACKET):
bracket_expr = self._parse_bracket_key_value()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bracket_expr = self._parse_disjunction()

^ this should be true for every case, less function calls

Comment on lines +6298 to +6299
if not bracket_expr:
continue
Copy link
Collaborator Author

@geooo109 geooo109 Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we raise here or even ignore the None ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What case does this correspond to? Is it a valid input? If not, ignore.

Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geooo109 you checked all test cases against engines, right?

path_parts.append(exp.JSONPathKey(this=next_key.name))
else:
break
self.raise_error("Expected key name after '.'")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about these exceptions.. If we fail to convert the JSON path into a canonical / transpilable AST, is there any way to fallback to a safe representation so that Snowflake roundtrip is not messed up?

Comment on lines +6298 to +6299
if not bracket_expr:
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What case does this correspond to? Is it a valid input? If not, ignore.

Comment on lines +6313 to +6314
elif self._match(TokenType.DCOLON):
this, path_parts = self._build_json_extract(this, path_parts, escape)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are brackets, dots and dcolons the only tokens we can find in these JSON path expressions, syntactically? In other words, is this branching logic complete?

@geooo109 geooo109 force-pushed the geooo109/json_path_refactor branch from 49d750d to afe1ba0 Compare March 18, 2026 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants