Skip to content

Add input validation for Friedman-Popescu H statistic calculation#16760

Draft
Copilot wants to merge 3 commits intomasterfrom
copilot/improve-input-validation
Draft

Add input validation for Friedman-Popescu H statistic calculation#16760
Copilot wants to merge 3 commits intomasterfrom
copilot/improve-input-validation

Conversation

Copy link
Contributor

Copilot AI commented Feb 4, 2026

getFriedmanPopescusH() throws NPE when called with non-numeric or non-existent column names. The error occurs in FriedmanPopescusH.filterFrame() when frame.vecs(cols) returns null for invalid columns.

# This produces a cryptic NPE
h = titanic_xgb.h(train, ['fair', 'age'])  # 'fair' is categorical
# Error: Cannot read field "_key" because "vecs[i]" is null

Changes

  • Added default validation method in FriedmanPopescusHCollector interface

    • Validates vars is non-null and non-empty
    • Validates all columns exist and are numeric
    • Validates columns contain non-constant values
  • Updated implementations to call validation before processing

    • GBMModel.getFriedmanPopescusH()
    • XGBoostModel.getFriedmanPopescusH()
  • Added test coverage in GBMTest.testFriedmanPopescusHInputValidation()

Error messages now clearly indicate the issue:

Calculating H statistics error: column 'fair' is not numeric. 
H statistics can only be calculated for numeric variables.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • nexus-private.hortonworks.com
    • Triggering command: /usr/lib/jvm/temurin-17-jdk-amd64/bin/java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.lang.invoke=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.prefs/java.util.prefs=ALL-UNNAMED --add-opens java.prefs/java.util.prefs=ALL-UNNAMED --add-opens java.base/java.nio.charset=ALL-UNNAMED --add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED -Xmx1024M (dns block)
  • repository.cloudera.com
    • Triggering command: /usr/lib/jvm/temurin-17-jdk-amd64/bin/java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.lang.invoke=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.prefs/java.util.prefs=ALL-UNNAMED --add-opens java.prefs/java.util.prefs=ALL-UNNAMED --add-opens java.base/java.nio.charset=ALL-UNNAMED --add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED -Xmx1024M (dns block)
  • repository.jboss.org
    • Triggering command: /usr/lib/jvm/temurin-17-jdk-amd64/bin/java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.lang.invoke=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.prefs/java.util.prefs=ALL-UNNAMED --add-opens java.prefs/java.util.prefs=ALL-UNNAMED --add-opens java.base/java.nio.charset=ALL-UNNAMED --add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED -Xmx1024M (dns block)
  • repository.mapr.com
    • Triggering command: /usr/lib/jvm/temurin-17-jdk-amd64/bin/java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.lang.invoke=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.prefs/java.util.prefs=ALL-UNNAMED --add-opens java.prefs/java.util.prefs=ALL-UNNAMED --add-opens java.base/java.nio.charset=ALL-UNNAMED --add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED -Xmx1024M (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>H statistic: Improve input validation</issue_title>
<issue_description>This code throws NPE:

from h2o.estimators import H2OXGBoostEstimator

# Import the titanic dataset into H2O:
titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv")

# Set the predictors and response; set the response as a factor:
titanic["survived"] = titanic["survived"].asfactor()
predictors = titanic.columns
response = "survived"

# Split the dataset into a train and valid set:
train, valid = titanic.split_frame(ratios=[.8], seed=1234)

# Build and train the model:
titanic_xgb = H2OXGBoostEstimator(booster='dart',
                                  normalize_type="tree",
                                  seed=1234)
titanic_xgb.train(x=predictors,
                  y=response,
                  training_frame=train,
                  validation_frame=valid)

# Eval performance:
perf = titanic_xgb.model_performance()

# Generate predictions on a test set (if necessary):
pred = titanic_xgb.predict(valid)

# Extract feature interactions:
feature_interactions = titanic_xgb.feature_interaction()

# Get Friedman and Popescu's H statistics
h = titanic_xgb.h(train, ['fair','age'])
print(h)
Server error java.lang.NullPointerException:
  Error: Cannot read field "_key" because "vecs[i]" is null
  Request: None

The input parameter validation should be done properly. </issue_description>

<agent_instructions>Validate input parameters of the Friedman Popescus H statistics. On the Java Backend, find the getFriedmanPopescusH() method, where the implementation is for the XGBoost and GBM algorithms. </agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits February 4, 2026 15:54
Co-authored-by: maurever <11465784+maurever@users.noreply.github.com>
Co-authored-by: maurever <11465784+maurever@users.noreply.github.com>
Copilot AI changed the title [WIP] Improve input validation for H statistic Add input validation for Friedman-Popescu H statistic calculation Feb 4, 2026
Copilot AI requested a review from maurever February 4, 2026 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

H statistic: Improve input validation

2 participants