Skip to content

[BugFix] Fix struct field names not being escaped#68967

Open
kyle-goodale-klaviyo wants to merge 1 commit intoStarRocks:mainfrom
kyle-goodale-klaviyo:kyle/fix-struct-field-escape
Open

[BugFix] Fix struct field names not being escaped#68967
kyle-goodale-klaviyo wants to merge 1 commit intoStarRocks:mainfrom
kyle-goodale-klaviyo:kyle/fix-struct-field-escape

Conversation

@kyle-goodale-klaviyo
Copy link
Contributor

Why I'm doing:

Struct field names are not escaped in some cases even if the user escapes them on the input. This leads to errors.

What I'm doing:

Fixes #68965

I updated StructField to always backtick field names in the toSQL() function.
Back ticks are stripped on parsing when we assign the name field here so we don't have to worry about double backticks, that same functionality is also the source of this bug.

I added a test case that reproduced the issue, confirmed it failed, then implemented my fix and confirmed the test case now passes.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.1
    • 4.0
    • 3.5
    • 3.4

@CelerData-Reviewer
Copy link

@codex review

Signed-off-by: kyle-goodale-klaviyo <kyle.goodale@klaviyo.com>
@kyle-goodale-klaviyo kyle-goodale-klaviyo force-pushed the kyle/fix-struct-field-escape branch from ed40cb9 to cb7f944 Compare February 6, 2026 20:00
@CelerData-Reviewer
Copy link

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cb7f944d3d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@kangkaisen
Copy link
Collaborator

please fix the failed tests, thanks

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an escaping bug where STRUCT field names could lose required backticks when types are rendered back to SQL (notably impacting persisted/rewritten SQL such as MV task definitions), causing parse failures for reserved keywords or non-identifier field names.

Changes:

  • Update StructField SQL/type string rendering to always backtick field names.
  • Add a regression test covering CAST-to-ARRAY-of-STRUCT with an escaped field name.
  • Update multiple SQL golden-result files to match the new backticked struct field name formatting.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
test/sql/test_sort/R/test_struct_order_by_edge_cases.sql Updates expected error strings to reflect backticked struct field names.
test/sql/test_sort/R/test_order_by_struct_comprehensive.sql Updates expected error strings to reflect backticked struct field names.
test/sql/test_semi/R/test_in_predict Updates expected error strings to reflect backticked struct field names (nested struct/map/array cases).
test/sql/test_pipe/R/basic Updates DESC expected output for struct column type formatting.
test/sql/test_map/R/test_map Updates expected error string to reflect backticked struct field names.
test/sql/test_join/R/test_join_struct Updates DDL and expected error strings for struct types to include backticked field names.
test/sql/test_join/R/test_join_map Updates expected error strings for map values containing structs with backticked field names.
test/sql/test_iceberg/R/test_metadata_table Updates DESC expected output for Iceberg metadata table struct field names.
test/sql/test_iceberg/R/test_iceberg_catalog_complex_type Updates expected explain output snippet to include backticked struct field names.
test/sql/test_http_sql/R/test_http_semi Updates HTTP SQL JSON metadata expected output to include backticked struct field names.
test/sql/test_function/R/test_typeof Updates typeof(struct(...)) expected output to include backticked default field names.
test/sql/test_files/R/test_parquet_list_legacy_encoding Updates desc files(...) expected output for struct field name backticks.
test/sql/test_files/R/test_orc_struct Updates DESC expected output for ORC-derived struct field names.
test/sql/test_files/R/test_avro_files_merge Updates desc files(...) expected output for Avro struct field name backticks.
test/sql/test_files/R/test_avro_complex_type Updates desc files(...) expected output for Avro struct field name backticks.
test/sql/test_files/R/test_avro_complex_nest_type Updates desc files(...) expected output for nested Avro complex types with struct field name backticks.
test/sql/test_create_table/R/test_create_table_with_time Updates expected DDL/error output to include backticked struct field names.
fe/fe-type/src/test/java/com/starrocks/type/StructTypeTest.java Updates unit test expectation for struct SQL rendering to include backticks.
fe/fe-type/src/main/java/com/starrocks/type/StructField.java Core fix: backtick struct field names when rendering to SQL/type strings/pretty print.
fe/fe-core/src/test/java/com/starrocks/sql/analyzer/AST2StringBuilderTest.java Adds regression test ensuring escaped struct field names survive AST-to-string rendering.

if (StringUtils.isEmpty(value)) {
return value;
}
return "`" + value + "`";
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backquoteName() wraps the field name in backticks but does not escape embedded backticks. If a StructField name can contain '`' (e.g., from external schemas/connectors or programmatic construction), the generated SQL/type string becomes invalid. Consider escaping internal backticks (e.g., doubling them) before wrapping so the output is always parseable.

Suggested change
return "`" + value + "`";
// Escape embedded backticks by doubling them so the resulting identifier is always parseable.
String escaped = value.replace("`", "``");
return "`" + escaped + "`";

Copilot uses AI. Check for mistakes.
Comment on lines 98 to 129
public String toSql(int depth, boolean printName) {
String typeSql = (depth < Type.MAX_NESTING_DEPTH) ? type.toSql(depth) : "...";
StringBuilder sb = new StringBuilder();
if (printName) {
sb.append(name).append(' ');
sb.append(backquoteName(name)).append(' ');
}
sb.append(typeSql);
if (comment != null) {
sb.append(String.format(" COMMENT '%s'", comment));
}
return sb.toString();
}

public String toTypeString(int depth) {
String typeSql = (depth < Type.MAX_NESTING_DEPTH) ? type.toTypeString(depth) : "...";
StringBuilder sb = new StringBuilder();
sb.append(name).append(' ');
sb.append(backquoteName(name)).append(' ');
sb.append(typeSql);
return sb.toString();
}

/**
* Pretty prints this field with lpad number of leading spaces.
* Calls prettyPrint(lpad) on this field's type.
*/
public String prettyPrint(int lpad, boolean printName) {
String leftPadding = Strings.repeat(" ", lpad);
StringBuilder sb = new StringBuilder(leftPadding);
if (printName) {
sb.append(name).append(' ');
sb.append(backquoteName(name)).append(' ');
}

Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change updates not only toSql() output but also toTypeString() and prettyPrint(). That broader change will ripple into other FE tests/outputs that assert struct type strings (e.g., fe/fe-core/src/test/java/com/starrocks/catalog/TypeTest.java expects "struct<col1 int(11)>" at around its testMysqlColumnType()). Please update the affected assertions or narrow the change if only toSql() should be modified.

Copilot uses AI. Check for mistakes.
Comment on lines 111 to 129
public String toTypeString(int depth) {
String typeSql = (depth < Type.MAX_NESTING_DEPTH) ? type.toTypeString(depth) : "...";
StringBuilder sb = new StringBuilder();
sb.append(name).append(' ');
sb.append(backquoteName(name)).append(' ');
sb.append(typeSql);
return sb.toString();
}

/**
* Pretty prints this field with lpad number of leading spaces.
* Calls prettyPrint(lpad) on this field's type.
*/
public String prettyPrint(int lpad, boolean printName) {
String leftPadding = Strings.repeat(" ", lpad);
StringBuilder sb = new StringBuilder(leftPadding);
if (printName) {
sb.append(name).append(' ');
sb.append(backquoteName(name)).append(' ');
}

Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says only StructField.toSQL() was updated, but this diff also changes toTypeString() and prettyPrint(). Please update the PR description to match the actual scope (or limit the code change to toSql() if that was the intent).

Copilot uses AI. Check for mistakes.
col_int int YES true None
col_map map<varchar(1048576),int> YES false None
col_struct struct<a varchar(1048576), b int(11)> YES false None
col_struct struct<`a` varchar(1048576), `b` int(11)> YES false None
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this DESC output block, the other rows end with a trailing tab after the last empty column (e.g., "None\t"). This row now lacks the trailing tab, which may cause the SQL test harness to treat it as a different number of columns and fail the result comparison. Keep the column count consistent by retaining the trailing delimiter for the final empty column.

Suggested change
col_struct struct<`a` varchar(1048576), `b` int(11)> YES false None
col_struct struct<`a` varchar(1048576), `b` int(11)> YES false None

Copilot uses AI. Check for mistakes.
-- result:
col_int int YES true None
col_struct struct<c_int int(11), c_float decimal(38, 9), c_double decimal(38, 9), c_char varchar(1048576), c_varchar varchar(1048576), c_date date, c_timestamp datetime, c_boolean boolean> YES false None
col_struct struct<`c_int` int(11), `c_float` decimal(38, 9), `c_double` decimal(38, 9), `c_char` varchar(1048576), `c_varchar` varchar(1048576), `c_date` date, `c_timestamp` datetime, `c_boolean` boolean> YES false None
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this DESC output, surrounding rows include a trailing tab after the final empty column (e.g., "None\t"), but this struct row does not. That can change the perceived column count in the expected result and break SQL test comparisons. Please keep the trailing delimiter consistent with the other DESC rows.

Suggested change
col_struct struct<`c_int` int(11), `c_float` decimal(38, 9), `c_double` decimal(38, 9), `c_char` varchar(1048576), `c_varchar` varchar(1048576), `c_date` date, `c_timestamp` datetime, `c_boolean` boolean> YES false None
col_struct struct<`c_int` int(11), `c_float` decimal(38, 9), `c_double` decimal(38, 9), `c_char` varchar(1048576), `c_varchar` varchar(1048576), `c_date` date, `c_timestamp` datetime, `c_boolean` boolean> YES false None

Copilot uses AI. Check for mistakes.
desc iceberg_sql_test_${uuid0}.iceberg_db_${uuid0}.ice_tbl_${uuid0}_${uuid0}$partitions;
-- result:
partition_value struct<p1 int(11), p2 int(11)> Yes false None
partition_value struct<`p1` int(11), `p2` int(11)> Yes false None
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This DESC output row no longer ends with the trailing tabs used by the following rows (which represent empty final columns). The SQL test harness compares result lines literally, so changing the trailing delimiters can cause a mismatch in column count/string equality. Please keep the trailing tab(s) consistent with the other DESC rows in this block.

Suggested change
partition_value struct<`p1` int(11), `p2` int(11)> Yes false None
partition_value struct<`p1` int(11), `p2` int(11)> Yes false None

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Struct field names inside casts are not escaped

3 participants