-
Notifications
You must be signed in to change notification settings - Fork 123
feat(hierarchical): support sample_weight #737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@FBruzzesi, this is the best I could do, have not been able to understand how to proceed. If you could take some time to assist, it would be great. |
|
Thanks for the contribution @AhmedThahir 🚀
I will take a look later today or tomorrow 😇 |
FBruzzesi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @AhmedThahir thanks again for the contribution, this is off a great start! I think we are already quite close!
I left a few suggestions in the code. Additionally to those fixes, could you add a few test cases?
|
Alright!
Will work on the:
1. suggested code changes
2. additional test cases
Will get back to you once done tonight/tomorrow.
Best Regards
Ahmed Thahir
LinkedIn <https://www.linkedin.com/in/AhmedThahir> | YouTube
<https://www.youtube.com/channel/UCZRDn0ZVbEYEHKVmMFBI6zQ>
…On Wed, 26 Mar 2025 at 2:23 AM Francesco Bruzzesi ***@***.***> wrote:
***@***.**** commented on this pull request.
Hey @AhmedThahir <https://github.com/AhmedThahir> thanks again for the
contribution, this is off a great start! I think we are already quite close!
I left a few suggestions in the code. Additionally to those fixes, could
you add a few test cases?
------------------------------
In sklego/meta/hierarchical_predictor.py
<#737 (comment)>:
> + try:
+ self.estimator_supports_sample_weight = "sample_weight" in inspect.signature(self.estimator.fit).parameters
+ except Exception:
+ self.estimator_supports_sample_weight = False
For a scikit-learn compatible estimator you cannot do any operation in
__init__ method. You can assign such attribute only during fit, and it
must be (semi) private, with a trailing _
------------------------------
In sklego/meta/hierarchical_predictor.py
<#737 (comment)>:
> @@ -281,12 +289,24 @@ def fit(self, X, y=None):
if self.n_features_in_ < 1:
msg = "Found 0 features, while a minimum of 1 if required."
raise ValueError(msg)
+
+ self.has_sw_ = sample_weight is not None
+
+ if self.has_sw_ and not self.estimator_supports_sample_weight:
+ msg = f"Estimator does not support sample_weight."
+ raise ValueError(msg)
+ sample_weight = _check_sample_weight(sample_weight, X, None, ensure_non_negative=True)
Here you probably need to cast X to native since scikit-learn does not
(yet) work with narwhals objects, also dtype argument has already None as
default:
⬇️ Suggested change
- sample_weight = _check_sample_weight(sample_weight, X, None, ensure_non_negative=True)
+ sample_weight = _check_sample_weight(sample_weight, X.to_native(), ensure_non_negative=True)
------------------------------
In sklego/meta/hierarchical_predictor.py
<#737 (comment)>:
> _y = nw.to_native(grp_frame[self._TARGET_NAME])
-
- return clone(self.estimator).fit(_X, _y)
+
+ args = [_X, _y]
+ if self.estimator_supports_sample_weight and self.has_sw_:
and self.has_sw_ can probably be skipped - passing all ones should behave
the same as not passing any sample weight (since the estimator supports
them)
—
Reply to this email directly, view it on GitHub
<#737 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AWUEQREUVXLWMRPY3TUNVED2WHCMJAVCNFSM6AAAAABZYVAV5SVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDOMJVGMZTCNBZGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
busy few days at my internship, will get back to you on this as soon as possible, hopefully by the end of the weekend. |
need to cast X to native since scikit-learn does not (yet) work with narwhals objects Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com>
and self.has_sw_ can probably be skipped - passing all ones should behave the same as not passing any sample weight (since the estimator supports them)
For a scikit-learn compatible estimator you cannot do any operation in __init__ method. You can assign such attribute only during fit, and it must be (semi) private, with a trailing _ koaning#737 (comment)
|
Still failing these tests when I run |
|
Also, @FBruzzesi , don't the default sklearn sample weight tests (which I have failed above) take care of the tests? What other tests would be required in our case? |
|
Hey @AhmedThahir thanks for pushing the changes
I am trying to debug the tests locally. From what I can see:
One similar issue I can think of is, what should happen if one @koaning considering we are already skipping 9 sklearn compatible estimator check for Hierarchical, I would consider removing that entirely and come up with some ad-hoc tests. What do you think? |
|
Any action required from my side ? |
Hey @AhmedThahir, not really! If you are interested you can think about how we can test most of the hierchical functionalities and edge cases eventually |
Yeah, agree. Optimise for ease of maintainance here :) |
I'm not sure if I'm quite sure how to proceed here. the inbuilt check sample weights seems to take care of all cases that I can think of. |
Hey @AhmedThahir, what I meant is bigger scoped actually. For now you might:
|
Confirmed with @FBruzzesi on the direction of the PR. This discussion took take place in #620
Description
Supporting sample_weight for HierarchicalPredictor, HierarchicalRegressor, HierarchicalClassifier.
Fixes #620
Type of change
Checklist: