GH-16719: Target encoding: Add feature interaction with known domain test is sometimes failing#16729
GH-16719: Target encoding: Add feature interaction with known domain test is sometimes failing#16729tomasfryda wants to merge 2 commits intorel-3.46.0from
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a failing test in the target encoding feature interaction functionality. The fix modifies the sparse categorical handling in NewChunk to preserve categorical type information for zero values when transitioning from sparse to dense representation, and adds a regression test to prevent the issue from recurring.
Key changes:
- Modified
addCategorical()method to handle the special case where sparse zeros need to be marked as categorical beforecancel_sparse()is called - Added logic to preserve categorical information for zero values that are surrounded by other categorical values during sparse-to-dense conversion
- Added a regression test with 96 categorical values to reproduce and prevent the original multinode test failure
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
| h2o-core/src/main/java/water/fvec/NewChunk.java | Enhanced addCategorical() method to preserve categorical type information for sparse zero values during sparse-to-dense conversion |
| h2o-core/src/test/java/water/fvec/NewChunkTest.java | Added regression test to verify sparse categorical handling with specific data pattern that triggered the original failure |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
valenad1
left a comment
There was a problem hiding this comment.
Thanks, nice, I am a bit nervous about changes in the core :) Could we rerun tests on latest rel-*?
|
Sure @valenad1 ! Rerunning again. I hope it won't get aborted again. |
c753272 to
ca28334
Compare
#16719