Skip to content

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Aug 20, 2025

Which issue does this PR close?

Follows on from #2204

Rationale for this change

See if there are any other failures when shuffle is enabled

What changes are included in this PR?

How are these changes tested?

@andygrove andygrove changed the title feat: Enable shuffle in Iceberg diff feat: [iceberg] Enable shuffle in Iceberg diff Aug 20, 2025
@codecov-commenter
Copy link

codecov-commenter commented Aug 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.56%. Comparing base (f09f8af) to head (de0cf65).
⚠️ Report is 405 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2205      +/-   ##
============================================
+ Coverage     56.12%   58.56%   +2.43%     
- Complexity      976     1284     +308     
============================================
  Files           119      143      +24     
  Lines         11743    13226    +1483     
  Branches       2251     2363     +112     
============================================
+ Hits           6591     7746    +1155     
- Misses         4012     4250     +238     
- Partials       1140     1230      +90     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove
Copy link
Member Author

TPC-DS failure due to #2206

@hsiang-c
Copy link
Contributor

One of the test failures is this

TestStoragePartitionedJoins > testJoinsWithBucketingOnLongColumn() > catalogName = testhadoop, implementation = org.apache.iceberg.spark.SparkCatalog, config = {type=hadoop, cache-enabled=false}, planningMode = DISTRIBUTED FAILED
    org.opentest4j.AssertionFailedError: [SPJ should not change query output: row 1 col 1 contents should match] 
    expected: -593534002
     but was: -2147483648
        at app//org.apache.iceberg.spark.SparkTestHelperBase.assertEquals(SparkTestHelperBase.java:86)
        at app//org.apache.iceberg.spark.SparkTestHelperBase.assertEquals(SparkTestHelperBase.java:68)
        at app//org.apache.iceberg.spark.sql.TestStoragePartitionedJoins.assertPartitioningAwarePlan(TestStoragePartitionedJoins.java:661)
        at app//org.apache.iceberg.spark.sql.TestStoragePartitionedJoins.checkJoin(TestStoragePartitionedJoins.java:612)
        at app//org.apache.iceberg.spark.sql.TestStoragePartitionedJoins.testJoinsWithBucketingOnLongColumn(TestStoragePartitionedJoins.java:148)

The corresponding test code expected 1 and 3, but from the above stack trace we got -593534002 and -2147483648. The latter value (-2^31) seems like an overflow?

    assertPartitioningAwarePlan(
        1, /* expected num of shuffles with SPJ */
        3, /* expected num of shuffles without SPJ */
        "SELECT t1.id, t1.salary, t1.%s "
            + "FROM %s t1 "
            + "INNER JOIN %s t2 "
            + "ON t1.id = t2.id AND t1.%s = t2.%s "
            + "ORDER BY t1.id, t1.%s",
        sourceColumnName,
        tableName,
        tableName(OTHER_TABLE_NAME),
        sourceColumnName,
        sourceColumnName,
        sourceColumnName);
  }

@andygrove
Copy link
Member Author

@parthchandra @hsiang-c This PR confirms that #2086 is fixed, but the following tests fail when we enable shuffle.

2025-08-20T20:06:43.7522129Z TestSparkDataWrite > testPartitionedFanoutCreateWithTargetFileSizeViaOption() > format = PARQUET, branch = null FAILED
2025-08-20T20:06:44.2516704Z TestSparkDataWrite > testPartitionedFanoutCreateWithTargetFileSizeViaOption() > format = PARQUET, branch = main FAILED
2025-08-20T20:06:44.7567492Z TestSparkDataWrite > testPartitionedFanoutCreateWithTargetFileSizeViaOption() > format = PARQUET, branch = testBranch FAILED
2025-08-20T20:06:44.9515019Z TestSparkDataWrite > testPartitionedFanoutCreateWithTargetFileSizeViaOption() > format = AVRO, branch = null FAILED
2025-08-20T20:06:45.3517948Z TestSparkDataWrite > testPartitionedFanoutCreateWithTargetFileSizeViaOption() > format = ORC, branch = testBranch FAILED
2025-08-20T20:06:49.0526007Z TestSparkDataWrite > testPartitionedFanoutCreateWithTargetFileSizeViaOption2() > format = PARQUET, branch = null FAILED
2025-08-20T20:06:49.3521908Z TestSparkDataWrite > testPartitionedFanoutCreateWithTargetFileSizeViaOption2() > format = PARQUET, branch = main FAILED
2025-08-20T20:06:49.8519083Z TestSparkDataWrite > testPartitionedFanoutCreateWithTargetFileSizeViaOption2() > format = PARQUET, branch = testBranch FAILED
2025-08-20T20:06:50.0520942Z TestSparkDataWrite > testPartitionedFanoutCreateWithTargetFileSizeViaOption2() > format = AVRO, branch = null FAILED
2025-08-20T20:06:50.3515999Z TestSparkDataWrite > testPartitionedFanoutCreateWithTargetFileSizeViaOption2() > format = ORC, branch = testBranch FAILED
2025-08-20T20:06:54.1525873Z TestSparkDataWrite > testPartitionedCreateWithTargetFileSizeViaOption() > format = PARQUET, branch = null FAILED
2025-08-20T20:06:54.5515512Z TestSparkDataWrite > testPartitionedCreateWithTargetFileSizeViaOption() > format = PARQUET, branch = main FAILED
2025-08-20T20:06:55.0543297Z TestSparkDataWrite > testPartitionedCreateWithTargetFileSizeViaOption() > format = PARQUET, branch = testBranch FAILED
2025-08-20T20:06:55.3532509Z TestSparkDataWrite > testPartitionedCreateWithTargetFileSizeViaOption() > format = AVRO, branch = null FAILED
2025-08-20T20:06:55.7548046Z TestSparkDataWrite > testPartitionedCreateWithTargetFileSizeViaOption() > format = ORC, branch = testBranch FAILED
2025-08-20T20:25:16.8516314Z TestStoragePartitionedJoins > testJoinsWithBucketingOnLongColumn() > catalogName = testhadoop, implementation = org.apache.iceberg.spark.SparkCatalog, config = {type=hadoop, cache-enabled=false}, planningMode = LOCAL FAILED
2025-08-20T20:25:20.3514538Z TestStoragePartitionedJoins > testJoinsWithBucketingOnLongColumn() > catalogName = testhadoop, implementation = org.apache.iceberg.spark.SparkCatalog, config = {type=hadoop, cache-enabled=false}, planningMode = DISTRIBUTED FAILED

Copy link
Contributor

@hsiang-c hsiang-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix LGTM, the test failures are known issues and we can fix it in following PRs.

@andygrove andygrove force-pushed the iceberg-enable-shuffle branch from 2a21b4c to a2dd38e Compare August 21, 2025 15:14
@andygrove andygrove changed the title feat: [iceberg] Enable shuffle in Iceberg diff feat: [iceberg] Enable Comet shuffle in Iceberg diff Aug 21, 2025
@parthchandra
Copy link
Contributor

lgtm. we can make this ready for review

@andygrove andygrove marked this pull request as ready for review August 21, 2025 18:28
@andygrove andygrove force-pushed the iceberg-enable-shuffle branch from 0e12f63 to de0cf65 Compare August 21, 2025 18:55
@andygrove
Copy link
Member Author

I pulled in the changes from #2210

Copy link
Contributor

@hsiang-c hsiang-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andygrove andygrove merged commit 8112e1a into apache:main Aug 22, 2025
97 checks passed
@andygrove andygrove deleted the iceberg-enable-shuffle branch August 22, 2025 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants