Skip to content

GH-16676 GLM: Remove offset effects#16749

Open
maurever wants to merge 19 commits intorel-3.46.0from
maurever_GH-16676_remove_offset_effects
Open

GH-16676 GLM: Remove offset effects#16749
maurever wants to merge 19 commits intorel-3.46.0from
maurever_GH-16676_remove_offset_effects

Conversation

@maurever
Copy link
Contributor

Issue: #16676

@maurever maurever self-assigned this Jan 22, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new experimental GLM option remove_offset_effects to keep offsets during training but remove their effect during scoring/prediction and model metrics, aligning with the “restricted vs unrestricted” model pattern already used for control_variables.

Changes:

  • Introduces remove_offset_effects parameter in GLM (backend + REST schema) and exposes it in R/Python clients.
  • Updates GLM scoring/metrics/scoring-history flow to compute both restricted (offset removed) and unrestricted metrics, and enables make_unrestricted_glm_model for this use case.
  • Adds docs + new tests/examples across Java/R/Python to exercise the feature.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 20 comments.

Show a summary per file
File Description
h2o-algos/src/main/java/hex/glm/GLM.java Implements restricted/unrestricted scoring-history + metrics computation when remove_offset_effects is enabled.
h2o-algos/src/main/java/hex/glm/GLMModel.java Adds new parameter + basic validation for remove_offset_effects.
h2o-algos/src/main/java/hex/glm/GLMScore.java Skips adding offset into the linear predictor when restricted scoring is enabled.
h2o-algos/src/main/java/hex/glm/GLMUtils.java Renames/extends scoring history combiner for “restricted” use.
h2o-algos/src/main/java/hex/schemas/GLMV3.java Exposes remove_offset_effects via REST schema.
h2o-algos/src/main/java/hex/api/MakeGLMModelHandler.java Allows creating unrestricted model when remove_offset_effects was used; resets the flag on the derived model.
h2o-algos/src/test/java/hex/glm/GLMControlVariablesTest.java Adds backend tests for remove_offset_effects behavior and its interaction with control_variables.
h2o-r/h2o-package/R/glm.R Adds R API parameter + expands make_unrestricted_glm_model guard.
h2o-bindings/bin/custom/R/gen_glm.py Updates generated R binding template for make_unrestricted_glm_model guard.
h2o-r/tests/testdir_algos/glm/runit_GLM_remove_offset_effects_explain.R Adds an R explain/learning-curve smoke test with remove_offset_effects.
h2o-py/h2o/estimators/glm.py Adds Python API parameter + getter/setter.
h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_effects.py Adds Python test comparing behavior with/without remove_offset_effects.
h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_glm.py Adds Python test scaffold around offset scoring behavior.
h2o-docs/src/product/data-science/algo-params/remove_offset_effects.rst Documents the new parameter and provides examples.
h2o-docs/src/product/data-science/algo-params/control_variables.rst Links control_variables docs to remove_offset_effects docs.
h2o-core/src/main/java/hex/ModelMetricsBinomial.java Minor signature cleanup (parameter rename).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@tomasfryda tomasfryda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maurever Correct me if I am wrong but it looks to me that there is no way of having both control variables(CV) and remove offset and getting:

  • model that uses CV and offset
  • model that doesn't use CV but does use offset
  • model that uses CV and doesn't use offset
  • model that doesn't use CV and doesn't use offset

(No way other than training multiple models)

maurever and others added 4 commits February 19, 2026 14:34
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…h2oai/h2o-3 into maurever_GH-16676_remove_offset_effects
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@valenad1 valenad1 modified the milestones: 3.46.0.10, 3.46.0.11 Mar 11, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 20 changed files in this pull request and generated 13 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 20 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

h2o-algos/src/test/java/hex/glm/GLMControlVariablesTest.java:433

  • train is created with a key and then referenced via params._train = train._key, but it is no longer put into the DKV. This will likely cause the builder to fail with a missing-frame error (or a different exception than expected), making the test unreliable. Add DKV.put(train); here (as done in the surrounding tests) before training the model.
            Vec cat1 = Vec.makeVec(new long[]{1,1,1,0,0},new String[]{"black","red"},Vec.newKey());
            Vec cat2 = Vec.makeVec(new long[]{1,1,1,0,0},new String[]{"a","b"},Vec.newKey());
            Vec res = Vec.makeVec(new double[]{1,1,2,0,0},cat1.group().addVec());
            train = new Frame(Key.<Frame>make("train"),new String[]{"x1", "x2", "y"},new Vec[]{cat1, cat2,res});

            GLMModel.GLMParameters params = new GLMModel.GLMParameters();
            params._train = train._key;
            params._alpha = new double[]{0};
            params._response_column = "y";
            params._intercept = false;
            params._control_variables = new String[]{"x1"};
            params._distribution = DistributionFamily.multinomial;
            glm = new GLM(params).trainModel().get();


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 20 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

h2o-algos/src/test/java/hex/glm/GLMControlVariablesTest.java:427

  • This test constructs train with a Key and then uses params._train = train._key, but train is never put into the DKV (DKV.put(train) was removed). As a result, GLM will not be able to fetch the training frame by key and the test will fail for the wrong reason.
            train = new Frame(Key.<Frame>make("train"),new String[]{"x1", "x2", "y"},new Vec[]{cat1, cat2,res});

            GLMModel.GLMParameters params = new GLMModel.GLMParameters();
            params._train = train._key;
            params._alpha = new double[]{0};

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

keepFrameKeys(keep, _model._output._betadiff_var);
Scope.untrack(keep.toArray(new Key[keep.size()]));
}
}_model.update(_job._key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting


private void scorePostProcessingControlVal(Frame train, long t1) {
private void scorePostProcessingRestrictedModel(Frame train, long t1) {
ModelMetrics mtrain = ModelMetrics.getFromDKV(_model, train); // updated by model.scoreAndUpdateModel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the implication of using ModelMetrics.getFromDKV(_model, train)?

Shouldn't we use a key that would have the name of the restricted model? Can this e.g. create issues by overwriting the results of unrestricted model.

_state.alpha());
}
}
_job.update(_workPerIteration, _state.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any new _job.update, do we still update the job so that the progress bar doesn't get stuck?

I know I asked you to remove some _job.update but we still call it but just once (I think).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scorePostProcessingRestrictedModel, scorePostProcessingRestrictedModelCVEnabled, and scorePostProcessingRestrictedModelROEnabled are nearly identical, could you deduplicate the code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants