Skip to content

Conversation

@AhmedThahir
Copy link

Confirmed with @FBruzzesi on the direction of the PR. This discussion took take place in #620

Description

Supporting sample_weight for HierarchicalPredictor, HierarchicalRegressor, HierarchicalClassifier.

Fixes #620

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the style guidelines (ruff)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (also to the readme.md)
  • I have added tests that prove my fix is effective or that my feature works
  • I have added tests to check whether the new feature adheres to the sklearn convention
  • New and existing unit tests pass locally with my changes

@AhmedThahir
Copy link
Author

@FBruzzesi, this is the best I could do, have not been able to understand how to proceed. If you could take some time to assist, it would be great.

=========================================================================== short test summary info ===========================================================================
FAILED tests/test_meta/test_hierarchical_predictor.py::test_sklearn_compatible_estimator[HierarchicalRegressor(estimator=LinearRegression(),groups=0)-check_sample_weights_not_an_array] - ValueError: DataFrame constructor not properly called!
FAILED tests/test_meta/test_hierarchical_predictor.py::test_sklearn_compatible_estimator[HierarchicalRegressor(estimator=LinearRegression(),groups=0)-check_sample_weight_equivalence_on_dense_data] - ZeroDivisionError: Weights sum to zero, can't be normalized
FAILED tests/test_meta/test_hierarchical_predictor.py::test_sklearn_compatible_estimator[HierarchicalRegressor(estimator=LinearRegression(),groups=0)-check_sample_weight_equivalence_on_sparse_data] - ValueError: Estimator does not work on sparse matrices
================================================================= 3 failed, 144 passed, 10 skipped in 11.49s ==================================================================

@FBruzzesi
Copy link
Collaborator

Thanks for the contribution @AhmedThahir 🚀

@FBruzzesi, this is the best I could do, have not been able to understand how to proceed. If you could take some time to assist, it would be great.

I will take a look later today or tomorrow 😇

Copy link
Collaborator

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @AhmedThahir thanks again for the contribution, this is off a great start! I think we are already quite close!

I left a few suggestions in the code. Additionally to those fixes, could you add a few test cases?

@AhmedThahir
Copy link
Author

AhmedThahir commented Mar 26, 2025 via email

@AhmedThahir
Copy link
Author

busy few days at my internship, will get back to you on this as soon as possible, hopefully by the end of the weekend.

AhmedThahir and others added 3 commits March 29, 2025 11:07
need to cast X to native since scikit-learn does not (yet) work with narwhals objects

Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com>
and self.has_sw_ can probably be skipped - passing all ones should behave the same as not passing any sample weight (since the estimator supports them)
For a scikit-learn compatible estimator you cannot do any operation in __init__ method. You can assign such attribute only during fit, and it must be (semi) private, with a trailing _

koaning#737 (comment)
@AhmedThahir
Copy link
Author

Still failing these tests when I run pytest tests/test_meta/test_hierarchical_predictor.py

sklego/meta/_grouped_utils.py:64: ValueError
=========================================================================== short test summary info ===========================================================================
FAILED tests/test_meta/test_hierarchical_predictor.py::test_sklearn_compatible_estimator[HierarchicalRegressor(estimator=LinearRegression(),groups=0)-check_sample_weights_not_an_array] - ValueError: DataFrame constructor not properly called!
FAILED tests/test_meta/test_hierarchical_predictor.py::test_sklearn_compatible_estimator[HierarchicalRegressor(estimator=LinearRegression(),groups=0)-check_sample_weight_equivalence_on_dense_data] - ZeroDivisionError: Weights sum to zero, can't be normalized
FAILED tests/test_meta/test_hierarchical_predictor.py::test_sklearn_compatible_estimator[HierarchicalRegressor(estimator=LinearRegression(),groups=0)-check_sample_weight_equivalence_on_sparse_data] - ValueError: Estimator does not work on sparse matrices

@AhmedThahir
Copy link
Author

Also, @FBruzzesi , don't the default sklearn sample weight tests (which I have failed above) take care of the tests? What other tests would be required in our case?

@FBruzzesi
Copy link
Collaborator

Hey @AhmedThahir thanks for pushing the changes

Still failing these tests when I run pytest tests/test_meta/test_hierarchical_predictor.py

I am trying to debug the tests locally. From what I can see:

  • check_sample_weights_not_an_array (ValueError: DataFrame constructor not properly called!): raises in line 277 because X is not an array type. I think it's ok to raise since some "structure" in the input is required anyway for Hierarchical predictors, however I would raise a more informative error message.

  • check_sample_weight_equivalence_on_dense_data (ZeroDivisionError: Weights sum to zero, can't be normalized) - the issue is with the group column being a random continuous value from the automatic sklearn check, hence not really a group-like

One similar issue I can think of is, what should happen if one sample_weight is non-zero, but one group has all zeros in the array subset? I don't have an answer for that, but I think that would be ok to raise.

@koaning considering we are already skipping 9 sklearn compatible estimator check for Hierarchical, I would consider removing that entirely and come up with some ad-hoc tests. What do you think?

@AhmedThahir
Copy link
Author

Any action required from my side ?

@FBruzzesi
Copy link
Collaborator

Any action required from my side ?

Hey @AhmedThahir, not really! If you are interested you can think about how we can test most of the hierchical functionalities and edge cases eventually

@koaning
Copy link
Owner

koaning commented Apr 14, 2025

@koaning considering we are already skipping 9 sklearn compatible estimator check for Hierarchical, I would consider removing that entirely and come up with some ad-hoc tests. What do you think?

Yeah, agree. Optimise for ease of maintainance here :)

@AhmedThahir
Copy link
Author

Any action required from my side ?

Hey @AhmedThahir, not really! If you are interested you can think about how we can test most of the hierchical functionalities and edge cases eventually

I'm not sure if I'm quite sure how to proceed here. the inbuilt check sample weights seems to take care of all cases that I can think of.

@FBruzzesi
Copy link
Collaborator

I'm not sure if I'm quite sure how to proceed here. the inbuilt check sample weights seems to take care of all cases that I can think of.

Hey @AhmedThahir, what I meant is bigger scoped actually. For now you might:

  • Add the failing tests to the list of those to skip (check_sample_weights_not_an_array, check_sample_weight_equivalence_on_dense_data, check_sample_weight_equivalence_on_sparse_data)
  • Introduce new dedicated tests to check sample weight behave as expected in the dataframe/array cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HierarchicalPredictor and HierarchicalTransformer

3 participants