feat(hierarchical): support sample_weight #737

AhmedThahir · 2025-03-25T20:59:45Z

Confirmed with @FBruzzesi on the direction of the PR. This discussion took take place in #620

Description

Supporting sample_weight for HierarchicalPredictor, HierarchicalRegressor, HierarchicalClassifier.

Fixes #620

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

My code follows the style guidelines (ruff)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (also to the readme.md)
I have added tests that prove my fix is effective or that my feature works
I have added tests to check whether the new feature adheres to the sklearn convention
New and existing unit tests pass locally with my changes

AhmedThahir · 2025-03-25T21:00:45Z

@FBruzzesi, this is the best I could do, have not been able to understand how to proceed. If you could take some time to assist, it would be great.

=========================================================================== short test summary info ===========================================================================
FAILED tests/test_meta/test_hierarchical_predictor.py::test_sklearn_compatible_estimator[HierarchicalRegressor(estimator=LinearRegression(),groups=0)-check_sample_weights_not_an_array] - ValueError: DataFrame constructor not properly called!
FAILED tests/test_meta/test_hierarchical_predictor.py::test_sklearn_compatible_estimator[HierarchicalRegressor(estimator=LinearRegression(),groups=0)-check_sample_weight_equivalence_on_dense_data] - ZeroDivisionError: Weights sum to zero, can't be normalized
FAILED tests/test_meta/test_hierarchical_predictor.py::test_sklearn_compatible_estimator[HierarchicalRegressor(estimator=LinearRegression(),groups=0)-check_sample_weight_equivalence_on_sparse_data] - ValueError: Estimator does not work on sparse matrices
================================================================= 3 failed, 144 passed, 10 skipped in 11.49s ==================================================================

FBruzzesi · 2025-03-25T21:30:21Z

Thanks for the contribution @AhmedThahir 🚀

@FBruzzesi, this is the best I could do, have not been able to understand how to proceed. If you could take some time to assist, it would be great.

I will take a look later today or tomorrow 😇

FBruzzesi

Hey @AhmedThahir thanks again for the contribution, this is off a great start! I think we are already quite close!

I left a few suggestions in the code. Additionally to those fixes, could you add a few test cases?

sklego/meta/hierarchical_predictor.py

AhmedThahir · 2025-03-26T06:00:15Z

Alright! Will work on the: 1. suggested code changes 2. additional test cases Will get back to you once done tonight/tomorrow. Best Regards Ahmed Thahir LinkedIn <https://www.linkedin.com/in/AhmedThahir> | YouTube <https://www.youtube.com/channel/UCZRDn0ZVbEYEHKVmMFBI6zQ>

…

On Wed, 26 Mar 2025 at 2:23 AM Francesco Bruzzesi ***@***.***> wrote: ***@***.**** commented on this pull request. Hey @AhmedThahir <https://github.com/AhmedThahir> thanks again for the contribution, this is off a great start! I think we are already quite close! I left a few suggestions in the code. Additionally to those fixes, could you add a few test cases? ------------------------------ In sklego/meta/hierarchical_predictor.py <#737 (comment)>: > + try: + self.estimator_supports_sample_weight = "sample_weight" in inspect.signature(self.estimator.fit).parameters + except Exception: + self.estimator_supports_sample_weight = False For a scikit-learn compatible estimator you cannot do any operation in __init__ method. You can assign such attribute only during fit, and it must be (semi) private, with a trailing _ ------------------------------ In sklego/meta/hierarchical_predictor.py <#737 (comment)>: > @@ -281,12 +289,24 @@ def fit(self, X, y=None): if self.n_features_in_ < 1: msg = "Found 0 features, while a minimum of 1 if required." raise ValueError(msg) + + self.has_sw_ = sample_weight is not None + + if self.has_sw_ and not self.estimator_supports_sample_weight: + msg = f"Estimator does not support sample_weight." + raise ValueError(msg) + sample_weight = _check_sample_weight(sample_weight, X, None, ensure_non_negative=True) Here you probably need to cast X to native since scikit-learn does not (yet) work with narwhals objects, also dtype argument has already None as default: ⬇️ Suggested change - sample_weight = _check_sample_weight(sample_weight, X, None, ensure_non_negative=True) + sample_weight = _check_sample_weight(sample_weight, X.to_native(), ensure_non_negative=True) ------------------------------ In sklego/meta/hierarchical_predictor.py <#737 (comment)>: > _y = nw.to_native(grp_frame[self._TARGET_NAME]) - - return clone(self.estimator).fit(_X, _y) + + args = [_X, _y] + if self.estimator_supports_sample_weight and self.has_sw_: and self.has_sw_ can probably be skipped - passing all ones should behave the same as not passing any sample weight (since the estimator supports them) — Reply to this email directly, view it on GitHub <#737 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AWUEQREUVXLWMRPY3TUNVED2WHCMJAVCNFSM6AAAAABZYVAV5SVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDOMJVGMZTCNBZGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

AhmedThahir · 2025-03-27T07:42:02Z

busy few days at my internship, will get back to you on this as soon as possible, hopefully by the end of the weekend.

need to cast X to native since scikit-learn does not (yet) work with narwhals objects Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com>

and self.has_sw_ can probably be skipped - passing all ones should behave the same as not passing any sample weight (since the estimator supports them)

For a scikit-learn compatible estimator you cannot do any operation in __init__ method. You can assign such attribute only during fit, and it must be (semi) private, with a trailing _ koaning#737 (comment)

AhmedThahir · 2025-03-29T07:13:28Z

Still failing these tests when I run pytest tests/test_meta/test_hierarchical_predictor.py

sklego/meta/_grouped_utils.py:64: ValueError
=========================================================================== short test summary info ===========================================================================
FAILED tests/test_meta/test_hierarchical_predictor.py::test_sklearn_compatible_estimator[HierarchicalRegressor(estimator=LinearRegression(),groups=0)-check_sample_weights_not_an_array] - ValueError: DataFrame constructor not properly called!
FAILED tests/test_meta/test_hierarchical_predictor.py::test_sklearn_compatible_estimator[HierarchicalRegressor(estimator=LinearRegression(),groups=0)-check_sample_weight_equivalence_on_dense_data] - ZeroDivisionError: Weights sum to zero, can't be normalized
FAILED tests/test_meta/test_hierarchical_predictor.py::test_sklearn_compatible_estimator[HierarchicalRegressor(estimator=LinearRegression(),groups=0)-check_sample_weight_equivalence_on_sparse_data] - ValueError: Estimator does not work on sparse matrices

AhmedThahir · 2025-03-29T07:14:57Z

Also, @FBruzzesi , don't the default sklearn sample weight tests (which I have failed above) take care of the tests? What other tests would be required in our case?

FBruzzesi · 2025-03-29T09:53:42Z

Hey @AhmedThahir thanks for pushing the changes

Still failing these tests when I run pytest tests/test_meta/test_hierarchical_predictor.py

I am trying to debug the tests locally. From what I can see:

check_sample_weights_not_an_array (ValueError: DataFrame constructor not properly called!): raises in line 277 because X is not an array type. I think it's ok to raise since some "structure" in the input is required anyway for Hierarchical predictors, however I would raise a more informative error message.
check_sample_weight_equivalence_on_dense_data (ZeroDivisionError: Weights sum to zero, can't be normalized) - the issue is with the group column being a random continuous value from the automatic sklearn check, hence not really a group-like

One similar issue I can think of is, what should happen if one sample_weight is non-zero, but one group has all zeros in the array subset? I don't have an answer for that, but I think that would be ok to raise.

@koaning considering we are already skipping 9 sklearn compatible estimator check for Hierarchical, I would consider removing that entirely and come up with some ad-hoc tests. What do you think?

AhmedThahir · 2025-04-01T10:46:04Z

Any action required from my side ?

FBruzzesi · 2025-04-01T11:32:40Z

Any action required from my side ?

Hey @AhmedThahir, not really! If you are interested you can think about how we can test most of the hierchical functionalities and edge cases eventually

koaning · 2025-04-14T11:34:37Z

@koaning considering we are already skipping 9 sklearn compatible estimator check for Hierarchical, I would consider removing that entirely and come up with some ad-hoc tests. What do you think?

Yeah, agree. Optimise for ease of maintainance here :)

AhmedThahir · 2025-04-14T14:20:45Z

Any action required from my side ?

Hey @AhmedThahir, not really! If you are interested you can think about how we can test most of the hierchical functionalities and edge cases eventually

I'm not sure if I'm quite sure how to proceed here. the inbuilt check sample weights seems to take care of all cases that I can think of.

FBruzzesi · 2025-04-14T19:14:58Z

I'm not sure if I'm quite sure how to proceed here. the inbuilt check sample weights seems to take care of all cases that I can think of.

Hey @AhmedThahir, what I meant is bigger scoped actually. For now you might:

Add the failing tests to the list of those to skip (check_sample_weights_not_an_array, check_sample_weight_equivalence_on_dense_data, check_sample_weight_equivalence_on_sparse_data)
Introduce new dedicated tests to check sample weight behave as expected in the dataframe/array cases

feat(hierarchical): support sample_weight

c578f53

FBruzzesi reviewed Mar 25, 2025

View reviewed changes

sklego/meta/hierarchical_predictor.py Outdated Show resolved Hide resolved

sklego/meta/hierarchical_predictor.py Outdated Show resolved Hide resolved

sklego/meta/hierarchical_predictor.py Outdated Show resolved Hide resolved

AhmedThahir and others added 3 commits March 29, 2025 11:07

fix(hierarchical): convert narwhals to numpy

36e201d

need to cast X to native since scikit-learn does not (yet) work with narwhals objects Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com>

ref(hierarchical): remove and self.has_sw_ for passing weights

3988b94

and self.has_sw_ can probably be skipped - passing all ones should behave the same as not passing any sample weight (since the estimator supports them)

fix(hierarchical): check sw supported to fit

d9de96b

For a scikit-learn compatible estimator you cannot do any operation in __init__ method. You can assign such attribute only during fit, and it must be (semi) private, with a trailing _ koaning#737 (comment)

feat(hierarchical): support sample_weight #737

Are you sure you want to change the base?

feat(hierarchical): support sample_weight #737

Uh oh!

Conversation

AhmedThahir commented Mar 25, 2025

Description

Type of change

Checklist:

Uh oh!

AhmedThahir commented Mar 25, 2025

Uh oh!

FBruzzesi commented Mar 25, 2025

Uh oh!

FBruzzesi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AhmedThahir commented Mar 26, 2025 via email

Uh oh!

AhmedThahir commented Mar 27, 2025

Uh oh!

AhmedThahir commented Mar 29, 2025

Uh oh!

AhmedThahir commented Mar 29, 2025

Uh oh!

FBruzzesi commented Mar 29, 2025

Uh oh!

AhmedThahir commented Apr 1, 2025

Uh oh!

FBruzzesi commented Apr 1, 2025

Uh oh!

koaning commented Apr 14, 2025

Uh oh!

AhmedThahir commented Apr 14, 2025

Uh oh!

FBruzzesi commented Apr 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants