feat: add sklearn-style input validation to LLMFeatureEngineer#35
feat: add sklearn-style input validation to LLMFeatureEngineer#35P-r-e-m-i-u-m wants to merge 5 commits intoRobertoCorti:mainfrom
Conversation
Signed-off-by: 🄂ʏᴇᴅ 🄰ʙᴅᴜʟ 🄰ᴍᴀ🄝 ✧ <amanbaba9404522@gmail.com>
| self : LLMFeatureEngineer | ||
| The fitted transformer | ||
| """ | ||
| if not isinstance(X, pd.DataFrame): |
There was a problem hiding this comment.
Can you abstract this valudation logic into a standalone validate_data() function in utils.validation? Similar to how scikit-learn handles it (see here), this would keep fit() cleaner and make the validation reusable across other methods/classes down the line.
| check_is_fitted(self) | ||
|
|
||
| # Convert LLM output to executor config and apply prefix to feature names | ||
| if not isinstance(X, pd.DataFrame): |
There was a problem hiding this comment.
same comment as here. A general validate_data can be used also here and avoid possible duplications
| The fitted transformer. Call ``transform()`` to apply the selected | ||
| features and ``to_transformer()`` to export them for production. | ||
| """ | ||
| if not isinstance(X, pd.DataFrame): |
There was a problem hiding this comment.
same comment as here. A general validate_data can be used also here and avoid possible duplications
|
It looks like pre-commit is failing in CI. Could you install it locally and run it before pushing? Use the command poetry run pre-commit installAfter that, it'll run automatically on each commit and catch any issues before they hit CI. |
…methods Signed-off-by: 🄂ʏᴇᴅ 🄰ʙᴅᴜʟ 🄰ᴍᴀ🄝 ✧ <amanbaba9404522@gmail.com>
|
"Refactored validation logic into a standalone validate_data() function in utils/validation.py and updated fit(), transform(), and fit_selective() to use it. Ready for re-review @RobertoCorti |
|
Again, it looks like that CI/CD is failing. Install pre-commit checks through: poetry run pre-commit installthem to run the pre-commit checks execute poetry run pre-commit run --all-files |
RobertoCorti
left a comment
There was a problem hiding this comment.
changes looks good, please fix my only comment and ci/cd
| TypeError | ||
| If X is not a DataFrame or y is not a Series. | ||
| """ | ||
| import pandas as pd |
There was a problem hiding this comment.
Please move import pandas as pd to the top of the file, outside the function.
Signed-off-by: 🄂ʏᴇᴅ 🄰ʙᴅᴜʟ 🄰ᴍᴀ🄝 ✧ <amanbaba9404522@gmail.com>
|
"Moved import pandas as pd to the top of validation.py. Pre-commit should pass now. Ready for re-review @RobertoCorti |
Signed-off-by: 🄂ʏᴇᴅ 🄰ʙᴅᴜʟ 🄰ᴍᴀ🄝 ✧ <amanbaba9404522@gmail.com>
Signed-off-by: 🄂ʏᴇᴅ 🄰ʙᴅᴜʟ 🄰ᴍᴀ🄝 ✧ <amanbaba9404522@gmail.com>
|
"Fixed feature_names_in_ missing in transform() — added fallback to set it from X.columns if not present. Ready for re-review @RobertoCorti 👍" |
Closes #34
Summary
Added sklearn-style input validation to all public method boundaries in
LLMFeatureEngineer.Changes Made
skfeaturellm/feature_engineer.py__init__(): Validatesmax_features(positive int or None) andverbose(non-negative int)fit(): ValidatesXis non-empty DataFrame,yis Series of same length. Storesn_features_in_andfeature_names_in_transform(): ValidatesXis DataFrame and raisesValueErrorif columns present during fit are missingfit_selective(): ValidatesX,y,n_rounds >= 1, andeval_settuple format. Storesn_features_in_andfeature_names_in_evaluate_features(is_transformed=True): RaisesValueErrorif expected generated feature columns are missingtests/test_feature_engineer.py