Merged
Conversation
- Add fit(), transform(), fit_transform() to BaseTransformation; make transform() abstract and execute() a concrete backwards-compat alias that delegates to fit_transform() - Rename execute() → transform() in UnaryTransformation and BinaryArithmeticTransformation so they satisfy the new abstract method
Override fit() to learn bin edges from training data (n_bins mode) or store custom edges (bin_edges mode). _apply_operation() uses bin_edges_ when set, falling back to _bins for the one-shot execute() path.
- fit(): iterates transformations calling t.fit(df), same raise_on_error handling as execute() - transform(): iterates calling t.transform(df), builds result DataFrame the same way execute() does - execute() unchanged for backwards compatibility
- FeatureTransformer(BaseEstimator, TransformerMixin) with explicit __init__ params (transformations, feature_prefix, raise_on_error) - fit() builds and fits a TransformationExecutor, stores feature_names_in_ - transform() delegates to executor_.transform() after check_is_fitted() - get_feature_names_out() returns original + generated column names - save()/load() as straightforward JSON I/O of constructor params - Export FeatureTransformer from skfeaturellm.__init__
…it_transform - Add to_transformer(features=None): builds a FeatureTransformer from successfully generated features, with optional name filtering (accepts names with or without the feature_prefix) - Remove save() and load(): serialization now belongs to FeatureTransformer - Remove fit_transform() override: the inherited TransformerMixin.fit_transform correctly passes y and **fit_params through to fit() - Remove corresponding save/load tests from test_feature_engineer.py
…reTransformer, to_transformer - test_executor.py: add executor.fit()/transform() tests including error handling for missing columns (raise and warn paths) - test_unary.py, test_binary.py: add fit(df).transform(df) == execute(df) parity tests and fit() returns-self tests - test_binning.py: add stateful fit/transform tests — bin_edges_ None before fit, n_bins and custom edges stored at fit time, transform reuses fitted edges without recomputing, execute() legacy path still works - test_feature_transformer.py: new file covering fit/transform, NotFittedError, get_params/set_params, clone, get_feature_names_out, Pipeline use, save/load JSON roundtrip - test_feature_engineer.py: add to_transformer() tests — raises before fit, returns FeatureTransformer with correct configs, filters by prefixed and unprefixed feature names
- Remove BaseTransformation.execute() — fit_transform() is the canonical single-call interface - Remove TransformationExecutor.execute() — callers use fit(df).transform(df) - Update feature_engineer.py to use executor.fit(X).transform(X) - Replace all t.execute(df) → t.fit_transform(df) in tests - Replace all executor.execute(df) → executor.fit(df).transform(df) in tests - Convert "matches_execute" parity tests into standalone fit/transform tests
…executor.py → pipeline.py
Renamed class, module (feature_transformer.py → feature_engineering_transformer.py), and test file (test_feature_transformer.py → test_feature_engineering_transformer.py). Updated all import paths and references across the codebase.
…gTransformer - user_guide.rst: replace Save/Load section with Production Pipeline section showing to_transformer(), Pipeline, and FeatureEngineeringTransformer.save/load - api_reference.rst: add automodule entry for feature_engineering_transformer - examples.rst: replace old engineer.save/load with FeatureEngineeringTransformer workflow; add Production Pipeline example - notebook: update imports, overview, Saving section, and add Pipeline + save/load cells using FeatureEngineeringTransformer
All cells re-executed with the new FeatureEngineeringTransformer Pipeline and save/load workflow.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add
FeatureEngineeringTransformerand fit/transform pipelineSummary
This PR introduces a proper
fit/transformpattern across the transformation layer and addsFeatureEngineeringTransformer— a deterministic, sklearn-compatible transformer for production use. It cleanly separates the LLM exploration phase (LLMFeatureEngineer) from the production inference phase (FeatureEngineeringTransformer).Core changes:
BaseTransformation: addsfit(),transform(), andfit_transform()to the base ABC; removes the legacyexecute()methodBinTransformation: made stateful —fit()learns bin edges from training data andtransform()reuses them on test/inference data, preventing data leakageTransformationPipeline(renamed fromTransformationExecutor): gainsfit()andtransform()methods; module renamedpipeline.pyFeatureEngineeringTransformer(new): aBaseEstimator/TransformerMixinthat holds transformation configs and is fully compatible with sklearnPipeline,GridSearchCV,clone(), andjoblib. Supportssave()/load()via JSONLLMFeatureEngineer: addsto_transformer()to export selected features to aFeatureEngineeringTransformer; removessave()/load()(serialization now belongs to the production transformer)Docs and examples:
FeatureEngineeringTransformerto_transformer()→Pipeline→save()/load()Test plan
poetry run pytest)BaseTransformationfit/transform interface, statefulBinTransformation(fit stores edges, transform reuses them),FeatureEngineeringTransformersklearn compatibility (get_params,set_params,clone,NotFittedError,Pipeline),save/loadroundtrip, andto_transformer()onLLMFeatureEngineer