Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Oct 27, 2025

Which issue does this PR close?

Closes #1313

Rationale for this change

What changes are included in this PR?

  • All checkSparkAnswer methods now:
    • have scaladoc documentation explaining what they check or do not check
    • delegate to one internal method for consistency and reduced code duplication
    • return a tuple of Spark plan and Comet plan
  • checkAnswerWithTol now supports Float (in addition to Double)
  • checkSparkAnswerAndCompareExplainPlan is removed and usages replaced with checkSparkAnswerAndFallbackReason, which does not suffer from the bug described in Extended explain info missing fallback reason #1313
  • A new experimental config is added for strict testing mode, which will fail tests that are not comprehensive, such as:
    • Tests that call checkSparkAnswer but could be calling checkSparkAnswerAndOperator because the plan is fully native
    • Tests that call checkSparkAnswer but could be calling checkSparkAnswerAndFallbackReason to check for a specific fallback reason
  • Some tests are improved to demonstrate the benefit of the strict testing mode

How are these changes tested?

@codecov-commenter
Copy link

codecov-commenter commented Oct 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.21%. Comparing base (f09f8af) to head (7bed2ad).
⚠️ Report is 644 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2656      +/-   ##
============================================
+ Coverage     56.12%   59.21%   +3.09%     
- Complexity      976     1447     +471     
============================================
  Files           119      147      +28     
  Lines         11743    13759    +2016     
  Branches       2251     2365     +114     
============================================
+ Hits           6591     8148    +1557     
- Misses         4012     4387     +375     
- Partials       1140     1224      +84     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove andygrove changed the title chore: Refactor checkSparkAnswer* methods in CometTestBase [WIP] chore: Various improvements to checkSparkAnswer* methods in CometTestBase [WIP] Oct 27, 2025
@andygrove andygrove marked this pull request as ready for review October 28, 2025 19:25
@andygrove andygrove changed the title chore: Various improvements to checkSparkAnswer* methods in CometTestBase [WIP] chore: Various improvements to checkSparkAnswer* methods in CometTestBase Oct 28, 2025
* A helper function for comparing Comet DataFrame with Spark result using absolute tolerance.
*/
protected def checkAnswerWithTol(
private def checkAnswerWithTol(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps its good time to rename it for

checkAnswerWithTolerance

?

" from int96timetbl"

if (conversionEnabled) {
// TODO this test should check for fallback reasons, but no fallback reason is recorded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it mean it has to be an assertion on fallback reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should really assert that we are falling back to Spark for the expected reason, and not for some unrelated reason. No fallback reason is recorded in this case though, which is a bug IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the plugin is disabled if PARQUET_INT96_TIMESTAMP_CONVERSION is true. I updated the comment.

Copy link
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big improvement! Thanks for listening to me gripe about this and then doing something about it @andygrove!

@andygrove
Copy link
Member Author

Thanks for the review @mbutrovich and keep griping away! It leads to quality improvements.

@andygrove andygrove merged commit 2c299d5 into apache:main Oct 29, 2025
102 checks passed
@andygrove andygrove deleted the testing branch October 29, 2025 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extended explain info missing fallback reason

4 participants