[BugFix] Fix struct field names not being escaped#68967
[BugFix] Fix struct field names not being escaped#68967kyle-goodale-klaviyo wants to merge 1 commit intoStarRocks:mainfrom
Conversation
|
@codex review |
Signed-off-by: kyle-goodale-klaviyo <kyle.goodale@klaviyo.com>
ed40cb9 to
cb7f944
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cb7f944d3d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
|
please fix the failed tests, thanks |
There was a problem hiding this comment.
Pull request overview
This PR fixes an escaping bug where STRUCT field names could lose required backticks when types are rendered back to SQL (notably impacting persisted/rewritten SQL such as MV task definitions), causing parse failures for reserved keywords or non-identifier field names.
Changes:
- Update
StructFieldSQL/type string rendering to always backtick field names. - Add a regression test covering CAST-to-ARRAY-of-STRUCT with an escaped field name.
- Update multiple SQL golden-result files to match the new backticked struct field name formatting.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/sql/test_sort/R/test_struct_order_by_edge_cases.sql | Updates expected error strings to reflect backticked struct field names. |
| test/sql/test_sort/R/test_order_by_struct_comprehensive.sql | Updates expected error strings to reflect backticked struct field names. |
| test/sql/test_semi/R/test_in_predict | Updates expected error strings to reflect backticked struct field names (nested struct/map/array cases). |
| test/sql/test_pipe/R/basic | Updates DESC expected output for struct column type formatting. |
| test/sql/test_map/R/test_map | Updates expected error string to reflect backticked struct field names. |
| test/sql/test_join/R/test_join_struct | Updates DDL and expected error strings for struct types to include backticked field names. |
| test/sql/test_join/R/test_join_map | Updates expected error strings for map values containing structs with backticked field names. |
| test/sql/test_iceberg/R/test_metadata_table | Updates DESC expected output for Iceberg metadata table struct field names. |
| test/sql/test_iceberg/R/test_iceberg_catalog_complex_type | Updates expected explain output snippet to include backticked struct field names. |
| test/sql/test_http_sql/R/test_http_semi | Updates HTTP SQL JSON metadata expected output to include backticked struct field names. |
| test/sql/test_function/R/test_typeof | Updates typeof(struct(...)) expected output to include backticked default field names. |
| test/sql/test_files/R/test_parquet_list_legacy_encoding | Updates desc files(...) expected output for struct field name backticks. |
| test/sql/test_files/R/test_orc_struct | Updates DESC expected output for ORC-derived struct field names. |
| test/sql/test_files/R/test_avro_files_merge | Updates desc files(...) expected output for Avro struct field name backticks. |
| test/sql/test_files/R/test_avro_complex_type | Updates desc files(...) expected output for Avro struct field name backticks. |
| test/sql/test_files/R/test_avro_complex_nest_type | Updates desc files(...) expected output for nested Avro complex types with struct field name backticks. |
| test/sql/test_create_table/R/test_create_table_with_time | Updates expected DDL/error output to include backticked struct field names. |
| fe/fe-type/src/test/java/com/starrocks/type/StructTypeTest.java | Updates unit test expectation for struct SQL rendering to include backticks. |
| fe/fe-type/src/main/java/com/starrocks/type/StructField.java | Core fix: backtick struct field names when rendering to SQL/type strings/pretty print. |
| fe/fe-core/src/test/java/com/starrocks/sql/analyzer/AST2StringBuilderTest.java | Adds regression test ensuring escaped struct field names survive AST-to-string rendering. |
| if (StringUtils.isEmpty(value)) { | ||
| return value; | ||
| } | ||
| return "`" + value + "`"; |
There was a problem hiding this comment.
backquoteName() wraps the field name in backticks but does not escape embedded backticks. If a StructField name can contain '`' (e.g., from external schemas/connectors or programmatic construction), the generated SQL/type string becomes invalid. Consider escaping internal backticks (e.g., doubling them) before wrapping so the output is always parseable.
| return "`" + value + "`"; | |
| // Escape embedded backticks by doubling them so the resulting identifier is always parseable. | |
| String escaped = value.replace("`", "``"); | |
| return "`" + escaped + "`"; |
| public String toSql(int depth, boolean printName) { | ||
| String typeSql = (depth < Type.MAX_NESTING_DEPTH) ? type.toSql(depth) : "..."; | ||
| StringBuilder sb = new StringBuilder(); | ||
| if (printName) { | ||
| sb.append(name).append(' '); | ||
| sb.append(backquoteName(name)).append(' '); | ||
| } | ||
| sb.append(typeSql); | ||
| if (comment != null) { | ||
| sb.append(String.format(" COMMENT '%s'", comment)); | ||
| } | ||
| return sb.toString(); | ||
| } | ||
|
|
||
| public String toTypeString(int depth) { | ||
| String typeSql = (depth < Type.MAX_NESTING_DEPTH) ? type.toTypeString(depth) : "..."; | ||
| StringBuilder sb = new StringBuilder(); | ||
| sb.append(name).append(' '); | ||
| sb.append(backquoteName(name)).append(' '); | ||
| sb.append(typeSql); | ||
| return sb.toString(); | ||
| } | ||
|
|
||
| /** | ||
| * Pretty prints this field with lpad number of leading spaces. | ||
| * Calls prettyPrint(lpad) on this field's type. | ||
| */ | ||
| public String prettyPrint(int lpad, boolean printName) { | ||
| String leftPadding = Strings.repeat(" ", lpad); | ||
| StringBuilder sb = new StringBuilder(leftPadding); | ||
| if (printName) { | ||
| sb.append(name).append(' '); | ||
| sb.append(backquoteName(name)).append(' '); | ||
| } | ||
|
|
There was a problem hiding this comment.
This change updates not only toSql() output but also toTypeString() and prettyPrint(). That broader change will ripple into other FE tests/outputs that assert struct type strings (e.g., fe/fe-core/src/test/java/com/starrocks/catalog/TypeTest.java expects "struct<col1 int(11)>" at around its testMysqlColumnType()). Please update the affected assertions or narrow the change if only toSql() should be modified.
| public String toTypeString(int depth) { | ||
| String typeSql = (depth < Type.MAX_NESTING_DEPTH) ? type.toTypeString(depth) : "..."; | ||
| StringBuilder sb = new StringBuilder(); | ||
| sb.append(name).append(' '); | ||
| sb.append(backquoteName(name)).append(' '); | ||
| sb.append(typeSql); | ||
| return sb.toString(); | ||
| } | ||
|
|
||
| /** | ||
| * Pretty prints this field with lpad number of leading spaces. | ||
| * Calls prettyPrint(lpad) on this field's type. | ||
| */ | ||
| public String prettyPrint(int lpad, boolean printName) { | ||
| String leftPadding = Strings.repeat(" ", lpad); | ||
| StringBuilder sb = new StringBuilder(leftPadding); | ||
| if (printName) { | ||
| sb.append(name).append(' '); | ||
| sb.append(backquoteName(name)).append(' '); | ||
| } | ||
|
|
There was a problem hiding this comment.
PR description says only StructField.toSQL() was updated, but this diff also changes toTypeString() and prettyPrint(). Please update the PR description to match the actual scope (or limit the code change to toSql() if that was the intent).
| col_int int YES true None | ||
| col_map map<varchar(1048576),int> YES false None | ||
| col_struct struct<a varchar(1048576), b int(11)> YES false None | ||
| col_struct struct<`a` varchar(1048576), `b` int(11)> YES false None |
There was a problem hiding this comment.
In this DESC output block, the other rows end with a trailing tab after the last empty column (e.g., "None\t"). This row now lacks the trailing tab, which may cause the SQL test harness to treat it as a different number of columns and fail the result comparison. Keep the column count consistent by retaining the trailing delimiter for the final empty column.
| col_struct struct<`a` varchar(1048576), `b` int(11)> YES false None | |
| col_struct struct<`a` varchar(1048576), `b` int(11)> YES false None |
| -- result: | ||
| col_int int YES true None | ||
| col_struct struct<c_int int(11), c_float decimal(38, 9), c_double decimal(38, 9), c_char varchar(1048576), c_varchar varchar(1048576), c_date date, c_timestamp datetime, c_boolean boolean> YES false None | ||
| col_struct struct<`c_int` int(11), `c_float` decimal(38, 9), `c_double` decimal(38, 9), `c_char` varchar(1048576), `c_varchar` varchar(1048576), `c_date` date, `c_timestamp` datetime, `c_boolean` boolean> YES false None |
There was a problem hiding this comment.
In this DESC output, surrounding rows include a trailing tab after the final empty column (e.g., "None\t"), but this struct row does not. That can change the perceived column count in the expected result and break SQL test comparisons. Please keep the trailing delimiter consistent with the other DESC rows.
| col_struct struct<`c_int` int(11), `c_float` decimal(38, 9), `c_double` decimal(38, 9), `c_char` varchar(1048576), `c_varchar` varchar(1048576), `c_date` date, `c_timestamp` datetime, `c_boolean` boolean> YES false None | |
| col_struct struct<`c_int` int(11), `c_float` decimal(38, 9), `c_double` decimal(38, 9), `c_char` varchar(1048576), `c_varchar` varchar(1048576), `c_date` date, `c_timestamp` datetime, `c_boolean` boolean> YES false None |
| desc iceberg_sql_test_${uuid0}.iceberg_db_${uuid0}.ice_tbl_${uuid0}_${uuid0}$partitions; | ||
| -- result: | ||
| partition_value struct<p1 int(11), p2 int(11)> Yes false None | ||
| partition_value struct<`p1` int(11), `p2` int(11)> Yes false None |
There was a problem hiding this comment.
This DESC output row no longer ends with the trailing tabs used by the following rows (which represent empty final columns). The SQL test harness compares result lines literally, so changing the trailing delimiters can cause a mismatch in column count/string equality. Please keep the trailing tab(s) consistent with the other DESC rows in this block.
| partition_value struct<`p1` int(11), `p2` int(11)> Yes false None | |
| partition_value struct<`p1` int(11), `p2` int(11)> Yes false None |
Why I'm doing:
Struct field names are not escaped in some cases even if the user escapes them on the input. This leads to errors.
What I'm doing:
Fixes #68965
I updated StructField to always backtick field names in the toSQL() function.
Back ticks are stripped on parsing when we assign the
namefield here so we don't have to worry about double backticks, that same functionality is also the source of this bug.I added a test case that reproduced the issue, confirmed it failed, then implemented my fix and confirmed the test case now passes.
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: