Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new experimental GLM option remove_offset_effects to keep offsets during training but remove their effect during scoring/prediction and model metrics, aligning with the “restricted vs unrestricted” model pattern already used for control_variables.
Changes:
- Introduces
remove_offset_effectsparameter in GLM (backend + REST schema) and exposes it in R/Python clients. - Updates GLM scoring/metrics/scoring-history flow to compute both restricted (offset removed) and unrestricted metrics, and enables
make_unrestricted_glm_modelfor this use case. - Adds docs + new tests/examples across Java/R/Python to exercise the feature.
Reviewed changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 20 comments.
Show a summary per file
| File | Description |
|---|---|
| h2o-algos/src/main/java/hex/glm/GLM.java | Implements restricted/unrestricted scoring-history + metrics computation when remove_offset_effects is enabled. |
| h2o-algos/src/main/java/hex/glm/GLMModel.java | Adds new parameter + basic validation for remove_offset_effects. |
| h2o-algos/src/main/java/hex/glm/GLMScore.java | Skips adding offset into the linear predictor when restricted scoring is enabled. |
| h2o-algos/src/main/java/hex/glm/GLMUtils.java | Renames/extends scoring history combiner for “restricted” use. |
| h2o-algos/src/main/java/hex/schemas/GLMV3.java | Exposes remove_offset_effects via REST schema. |
| h2o-algos/src/main/java/hex/api/MakeGLMModelHandler.java | Allows creating unrestricted model when remove_offset_effects was used; resets the flag on the derived model. |
| h2o-algos/src/test/java/hex/glm/GLMControlVariablesTest.java | Adds backend tests for remove_offset_effects behavior and its interaction with control_variables. |
| h2o-r/h2o-package/R/glm.R | Adds R API parameter + expands make_unrestricted_glm_model guard. |
| h2o-bindings/bin/custom/R/gen_glm.py | Updates generated R binding template for make_unrestricted_glm_model guard. |
| h2o-r/tests/testdir_algos/glm/runit_GLM_remove_offset_effects_explain.R | Adds an R explain/learning-curve smoke test with remove_offset_effects. |
| h2o-py/h2o/estimators/glm.py | Adds Python API parameter + getter/setter. |
| h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_effects.py | Adds Python test comparing behavior with/without remove_offset_effects. |
| h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_glm.py | Adds Python test scaffold around offset scoring behavior. |
| h2o-docs/src/product/data-science/algo-params/remove_offset_effects.rst | Documents the new parameter and provides examples. |
| h2o-docs/src/product/data-science/algo-params/control_variables.rst | Links control_variables docs to remove_offset_effects docs. |
| h2o-core/src/main/java/hex/ModelMetricsBinomial.java | Minor signature cleanup (parameter rename). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
h2o-r/tests/testdir_algos/glm/runit_GLM_remove_offset_effects_explain.R
Outdated
Show resolved
Hide resolved
h2o-r/tests/testdir_algos/glm/runit_GLM_remove_offset_effects_explain.R
Outdated
Show resolved
Hide resolved
h2o-docs/src/product/data-science/algo-params/remove_offset_effects.rst
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
h2o-docs/src/product/data-science/algo-params/remove_offset_effects.rst
Outdated
Show resolved
Hide resolved
tomasfryda
left a comment
There was a problem hiding this comment.
@maurever Correct me if I am wrong but it looks to me that there is no way of having both control variables(CV) and remove offset and getting:
- model that uses CV and offset
- model that doesn't use CV but does use offset
- model that uses CV and doesn't use offset
- model that doesn't use CV and doesn't use offset
(No way other than training multiple models)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…h2oai/h2o-3 into maurever_GH-16676_remove_offset_effects
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_effects_compare.py
Outdated
Show resolved
Hide resolved
h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_effects_compare.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 20 changed files in this pull request and generated 13 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
h2o-py/tests/testdir_algos/glm/pyunit_glm_make_unrestricted_model.py
Outdated
Show resolved
Hide resolved
h2o-r/tests/testdir_algos/glm/runit_GLM_make_unrestricted_model.R
Outdated
Show resolved
Hide resolved
h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_effects_compare.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 20 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
h2o-algos/src/test/java/hex/glm/GLMControlVariablesTest.java:433
trainis created with a key and then referenced viaparams._train = train._key, but it is no longer put into the DKV. This will likely cause the builder to fail with a missing-frame error (or a different exception than expected), making the test unreliable. AddDKV.put(train);here (as done in the surrounding tests) before training the model.
Vec cat1 = Vec.makeVec(new long[]{1,1,1,0,0},new String[]{"black","red"},Vec.newKey());
Vec cat2 = Vec.makeVec(new long[]{1,1,1,0,0},new String[]{"a","b"},Vec.newKey());
Vec res = Vec.makeVec(new double[]{1,1,2,0,0},cat1.group().addVec());
train = new Frame(Key.<Frame>make("train"),new String[]{"x1", "x2", "y"},new Vec[]{cat1, cat2,res});
GLMModel.GLMParameters params = new GLMModel.GLMParameters();
params._train = train._key;
params._alpha = new double[]{0};
params._response_column = "y";
params._intercept = false;
params._control_variables = new String[]{"x1"};
params._distribution = DistributionFamily.multinomial;
glm = new GLM(params).trainModel().get();
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
h2o-docs/src/product/data-science/algo-params/remove_offset_effects.rst
Outdated
Show resolved
Hide resolved
h2o-docs/src/product/data-science/algo-params/control_variables.rst
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 20 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
h2o-algos/src/test/java/hex/glm/GLMControlVariablesTest.java:427
- This test constructs
trainwith a Key and then usesparams._train = train._key, buttrainis never put into the DKV (DKV.put(train)was removed). As a result, GLM will not be able to fetch the training frame by key and the test will fail for the wrong reason.
train = new Frame(Key.<Frame>make("train"),new String[]{"x1", "x2", "y"},new Vec[]{cat1, cat2,res});
GLMModel.GLMParameters params = new GLMModel.GLMParameters();
params._train = train._key;
params._alpha = new double[]{0};
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
h2o-r/tests/testdir_algos/glm/runit_GLM_make_unrestricted_model.R
Outdated
Show resolved
Hide resolved
| keepFrameKeys(keep, _model._output._betadiff_var); | ||
| Scope.untrack(keep.toArray(new Key[keep.size()])); | ||
| } | ||
| }_model.update(_job._key); |
|
|
||
| private void scorePostProcessingControlVal(Frame train, long t1) { | ||
| private void scorePostProcessingRestrictedModel(Frame train, long t1) { | ||
| ModelMetrics mtrain = ModelMetrics.getFromDKV(_model, train); // updated by model.scoreAndUpdateModel |
There was a problem hiding this comment.
What is the implication of using ModelMetrics.getFromDKV(_model, train)?
Shouldn't we use a key that would have the name of the restricted model? Can this e.g. create issues by overwriting the results of unrestricted model.
| _state.alpha()); | ||
| } | ||
| } | ||
| _job.update(_workPerIteration, _state.toString()); |
There was a problem hiding this comment.
I didn't find any new _job.update, do we still update the job so that the progress bar doesn't get stuck?
I know I asked you to remove some _job.update but we still call it but just once (I think).
There was a problem hiding this comment.
scorePostProcessingRestrictedModel, scorePostProcessingRestrictedModelCVEnabled, and scorePostProcessingRestrictedModelROEnabled are nearly identical, could you deduplicate the code?
Issue: #16676