feat: add sklearn-style input validation to LLMFeatureEngineer by P-r-e-m-i-u-m · Pull Request #35 · RobertoCorti/skfeaturellm

P-r-e-m-i-u-m · 2026-03-09T17:38:12Z

Closes #34

Summary

Added sklearn-style input validation to all public method boundaries in LLMFeatureEngineer.

Changes Made

`skfeaturellm/feature_engineer.py`

__init__(): Validates max_features (positive int or None) and verbose (non-negative int)
fit(): Validates X is non-empty DataFrame, y is Series of same length. Stores n_features_in_ and feature_names_in_
transform(): Validates X is DataFrame and raises ValueError if columns present during fit are missing
fit_selective(): Validates X, y, n_rounds >= 1, and eval_set tuple format. Stores n_features_in_ and feature_names_in_
evaluate_features(is_transformed=True): Raises ValueError if expected generated feature columns are missing

`tests/test_feature_engineer.py`

Added 11 new tests covering all new validation paths
All existing tests pass unchanged

Signed-off-by: 🄂ʏᴇᴅ 🄰ʙᴅᴜʟ 🄰ᴍᴀ🄝 ✧ <amanbaba9404522@gmail.com>

RobertoCorti · 2026-03-10T07:21:41Z

        self : LLMFeatureEngineer
            The fitted transformer
        """
+        if not isinstance(X, pd.DataFrame):


Can you abstract this valudation logic into a standalone validate_data() function in utils.validation? Similar to how scikit-learn handles it (see here), this would keep fit() cleaner and make the validation reusable across other methods/classes down the line.

RobertoCorti · 2026-03-10T07:25:06Z

        check_is_fitted(self)
-
-        # Convert LLM output to executor config and apply prefix to feature names
+        if not isinstance(X, pd.DataFrame):


same comment as here. A general validate_data can be used also here and avoid possible duplications

RobertoCorti · 2026-03-10T07:51:20Z

            The fitted transformer. Call ``transform()`` to apply the selected
            features and ``to_transformer()`` to export them for production.
        """
+        if not isinstance(X, pd.DataFrame):


same comment as here. A general validate_data can be used also here and avoid possible duplications

RobertoCorti · 2026-03-10T07:54:37Z

It looks like pre-commit is failing in CI. Could you install it locally and run it before pushing?

Use the command

poetry run pre-commit install

After that, it'll run automatically on each commit and catch any issues before they hit CI.

…methods Signed-off-by: 🄂ʏᴇᴅ 🄰ʙᴅᴜʟ 🄰ᴍᴀ🄝 ✧ <amanbaba9404522@gmail.com>

P-r-e-m-i-u-m · 2026-03-16T10:38:36Z

"Refactored validation logic into a standalone validate_data() function in utils/validation.py and updated fit(), transform(), and fit_selective() to use it. Ready for re-review @RobertoCorti

RobertoCorti · 2026-03-17T08:14:42Z

Again, it looks like that CI/CD is failing. Install pre-commit checks through:

poetry run pre-commit install

them to run the pre-commit checks execute

poetry run pre-commit run --all-files

RobertoCorti

changes looks good, please fix my only comment and ci/cd

RobertoCorti · 2026-03-17T08:10:57Z

+    TypeError
+        If X is not a DataFrame or y is not a Series.
+    """
+    import pandas as pd


Please move import pandas as pd to the top of the file, outside the function.

Signed-off-by: 🄂ʏᴇᴅ 🄰ʙᴅᴜʟ 🄰ᴍᴀ🄝 ✧ <amanbaba9404522@gmail.com>

P-r-e-m-i-u-m · 2026-03-17T10:08:56Z

"Moved import pandas as pd to the top of validation.py. Pre-commit should pass now. Ready for re-review @RobertoCorti

Signed-off-by: 🄂ʏᴇᴅ 🄰ʙᴅᴜʟ 🄰ᴍᴀ🄝 ✧ <amanbaba9404522@gmail.com>

P-r-e-m-i-u-m · 2026-03-26T09:42:23Z

"Fixed feature_names_in_ missing in transform() — added fallback to set it from X.columns if not present. Ready for re-review @RobertoCorti 👍"

feat: add sklearn-style input validation to LLMFeatureEngineer

90262d5

Signed-off-by: 🄂ʏᴇᴅ 🄰ʙᴅᴜʟ 🄰ᴍᴀ🄝 ✧ <amanbaba9404522@gmail.com>

RobertoCorti reviewed Mar 10, 2026

View reviewed changes

RobertoCorti requested changes Mar 10, 2026

View reviewed changes

refactor: extract validate_data() to utils/validation and use across …

b684905

…methods Signed-off-by: 🄂ʏᴇᴅ 🄰ʙᴅᴜʟ 🄰ᴍᴀ🄝 ✧ <amanbaba9404522@gmail.com>

P-r-e-m-i-u-m requested a review from RobertoCorti March 16, 2026 10:38

RobertoCorti requested changes Mar 17, 2026

View reviewed changes

fix: move pandas import to top of validation.py for pre-commit

5246885

Signed-off-by: 🄂ʏᴇᴅ 🄰ʙᴅᴜʟ 🄰ᴍᴀ🄝 ✧ <amanbaba9404522@gmail.com>

P-r-e-m-i-u-m requested a review from RobertoCorti March 17, 2026 10:09

P-r-e-m-i-u-m added 2 commits March 26, 2026 15:04

style: apply black formatting

c5c7760

Signed-off-by: 🄂ʏᴇᴅ 🄰ʙᴅᴜʟ 🄰ᴍᴀ🄝 ✧ <amanbaba9404522@gmail.com>

fix: set feature_names_in_ in transform() if missing

aa2adac

Signed-off-by: 🄂ʏᴇᴅ 🄰ʙᴅᴜʟ 🄰ᴍᴀ🄝 ✧ <amanbaba9404522@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add sklearn-style input validation to LLMFeatureEngineer#35

feat: add sklearn-style input validation to LLMFeatureEngineer#35
P-r-e-m-i-u-m wants to merge 5 commits intoRobertoCorti:mainfrom
P-r-e-m-i-u-m:feat/input-validation

P-r-e-m-i-u-m commented Mar 9, 2026

Uh oh!

RobertoCorti Mar 10, 2026

Uh oh!

RobertoCorti Mar 10, 2026

Uh oh!

RobertoCorti Mar 10, 2026

Uh oh!

RobertoCorti commented Mar 10, 2026 •

edited

Loading

Uh oh!

P-r-e-m-i-u-m commented Mar 16, 2026

Uh oh!

RobertoCorti commented Mar 17, 2026

Uh oh!

RobertoCorti left a comment

Uh oh!

RobertoCorti Mar 17, 2026

Uh oh!

P-r-e-m-i-u-m commented Mar 17, 2026

Uh oh!

P-r-e-m-i-u-m commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

P-r-e-m-i-u-m commented Mar 9, 2026

Summary

Changes Made

skfeaturellm/feature_engineer.py

tests/test_feature_engineer.py

Uh oh!

RobertoCorti Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

RobertoCorti Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

RobertoCorti Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

RobertoCorti commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

P-r-e-m-i-u-m commented Mar 16, 2026

Uh oh!

RobertoCorti commented Mar 17, 2026

Uh oh!

RobertoCorti left a comment

Choose a reason for hiding this comment

Uh oh!

RobertoCorti Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

P-r-e-m-i-u-m commented Mar 17, 2026

Uh oh!

P-r-e-m-i-u-m commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`skfeaturellm/feature_engineer.py`

`tests/test_feature_engineer.py`

RobertoCorti commented Mar 10, 2026 •

edited

Loading