Skip to content

GH-16719: Target encoding: Add feature interaction with known domain test is sometimes failing#16729

Open
tomasfryda wants to merge 2 commits intorel-3.46.0from
tomf_GH-16719-fix_target_encoding_test
Open

GH-16719: Target encoding: Add feature interaction with known domain test is sometimes failing#16729
tomasfryda wants to merge 2 commits intorel-3.46.0from
tomf_GH-16719-fix_target_encoding_test

Conversation

@tomasfryda
Copy link
Contributor

@tomasfryda tomasfryda self-assigned this Jan 8, 2026
Copilot AI review requested due to automatic review settings January 8, 2026 09:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a failing test in the target encoding feature interaction functionality. The fix modifies the sparse categorical handling in NewChunk to preserve categorical type information for zero values when transitioning from sparse to dense representation, and adds a regression test to prevent the issue from recurring.

Key changes:

  • Modified addCategorical() method to handle the special case where sparse zeros need to be marked as categorical before cancel_sparse() is called
  • Added logic to preserve categorical information for zero values that are surrounded by other categorical values during sparse-to-dense conversion
  • Added a regression test with 96 categorical values to reproduce and prevent the original multinode test failure

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 11 comments.

File Description
h2o-core/src/main/java/water/fvec/NewChunk.java Enhanced addCategorical() method to preserve categorical type information for sparse zero values during sparse-to-dense conversion
h2o-core/src/test/java/water/fvec/NewChunkTest.java Added regression test to verify sparse categorical handling with specific data pattern that triggered the original failure

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@valenad1 valenad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, nice, I am a bit nervous about changes in the core :) Could we rerun tests on latest rel-*?

@tomasfryda
Copy link
Contributor Author

Sure @valenad1 ! Rerunning again. I hope it won't get aborted again.

@tomasfryda tomasfryda force-pushed the tomf_GH-16719-fix_target_encoding_test branch from c753272 to ca28334 Compare January 13, 2026 17:14
@tomasfryda tomasfryda added this to the 3.46.0.10 milestone Mar 3, 2026
@valenad1 valenad1 modified the milestones: 3.46.0.10, 3.46.0.11 Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants