Feat/create feature transformer by RobertoCorti · Pull Request #31 · RobertoCorti/skfeaturellm

RobertoCorti · 2026-03-04T10:46:41Z

Add `FeatureEngineeringTransformer` and fit/transform pipeline

Summary

This PR introduces a proper fit/transform pattern across the transformation layer and adds FeatureEngineeringTransformer — a deterministic, sklearn-compatible transformer for production use. It cleanly separates the LLM exploration phase (LLMFeatureEngineer) from the production inference phase (FeatureEngineeringTransformer).

Core changes:

BaseTransformation: adds fit(), transform(), and fit_transform() to the base ABC; removes the legacy execute() method
BinTransformation: made stateful — fit() learns bin edges from training data and transform() reuses them on test/inference data, preventing data leakage
TransformationPipeline (renamed from TransformationExecutor): gains fit() and transform() methods; module renamed pipeline.py
FeatureEngineeringTransformer (new): a BaseEstimator/TransformerMixin that holds transformation configs and is fully compatible with sklearn Pipeline, GridSearchCV, clone(), and joblib. Supports save()/load() via JSON
LLMFeatureEngineer: adds to_transformer() to export selected features to a FeatureEngineeringTransformer; removes save()/load() (serialization now belongs to the production transformer)

Docs and examples:

User guide, API reference, and examples updated to document the two-phase workflow and FeatureEngineeringTransformer
Tutorial notebook updated with a new "Production Pipeline" section showing to_transformer() → Pipeline → save()/load()

Test plan

All 151 existing tests pass (poetry run pytest)
New tests cover: BaseTransformation fit/transform interface, stateful BinTransformation (fit stores edges, transform reuses them), FeatureEngineeringTransformer sklearn compatibility (get_params, set_params, clone, NotFittedError, Pipeline), save/load roundtrip, and to_transformer() on LLMFeatureEngineer
Notebook re-executed end-to-end with fresh outputs

- Add fit(), transform(), fit_transform() to BaseTransformation; make transform() abstract and execute() a concrete backwards-compat alias that delegates to fit_transform() - Rename execute() → transform() in UnaryTransformation and BinaryArithmeticTransformation so they satisfy the new abstract method

Override fit() to learn bin edges from training data (n_bins mode) or store custom edges (bin_edges mode). _apply_operation() uses bin_edges_ when set, falling back to _bins for the one-shot execute() path.

- fit(): iterates transformations calling t.fit(df), same raise_on_error handling as execute() - transform(): iterates calling t.transform(df), builds result DataFrame the same way execute() does - execute() unchanged for backwards compatibility

- FeatureTransformer(BaseEstimator, TransformerMixin) with explicit __init__ params (transformations, feature_prefix, raise_on_error) - fit() builds and fits a TransformationExecutor, stores feature_names_in_ - transform() delegates to executor_.transform() after check_is_fitted() - get_feature_names_out() returns original + generated column names - save()/load() as straightforward JSON I/O of constructor params - Export FeatureTransformer from skfeaturellm.__init__

…it_transform - Add to_transformer(features=None): builds a FeatureTransformer from successfully generated features, with optional name filtering (accepts names with or without the feature_prefix) - Remove save() and load(): serialization now belongs to FeatureTransformer - Remove fit_transform() override: the inherited TransformerMixin.fit_transform correctly passes y and **fit_params through to fit() - Remove corresponding save/load tests from test_feature_engineer.py

…reTransformer, to_transformer - test_executor.py: add executor.fit()/transform() tests including error handling for missing columns (raise and warn paths) - test_unary.py, test_binary.py: add fit(df).transform(df) == execute(df) parity tests and fit() returns-self tests - test_binning.py: add stateful fit/transform tests — bin_edges_ None before fit, n_bins and custom edges stored at fit time, transform reuses fitted edges without recomputing, execute() legacy path still works - test_feature_transformer.py: new file covering fit/transform, NotFittedError, get_params/set_params, clone, get_feature_names_out, Pipeline use, save/load JSON roundtrip - test_feature_engineer.py: add to_transformer() tests — raises before fit, returns FeatureTransformer with correct configs, filters by prefixed and unprefixed feature names

- Remove BaseTransformation.execute() — fit_transform() is the canonical single-call interface - Remove TransformationExecutor.execute() — callers use fit(df).transform(df) - Update feature_engineer.py to use executor.fit(X).transform(X) - Replace all t.execute(df) → t.fit_transform(df) in tests - Replace all executor.execute(df) → executor.fit(df).transform(df) in tests - Convert "matches_execute" parity tests into standalone fit/transform tests

…executor.py → pipeline.py

Renamed class, module (feature_transformer.py → feature_engineering_transformer.py), and test file (test_feature_transformer.py → test_feature_engineering_transformer.py). Updated all import paths and references across the codebase.

…gTransformer - user_guide.rst: replace Save/Load section with Production Pipeline section showing to_transformer(), Pipeline, and FeatureEngineeringTransformer.save/load - api_reference.rst: add automodule entry for feature_engineering_transformer - examples.rst: replace old engineer.save/load with FeatureEngineeringTransformer workflow; add Production Pipeline example - notebook: update imports, overview, Saving section, and add Pipeline + save/load cells using FeatureEngineeringTransformer

All cells re-executed with the new FeatureEngineeringTransformer Pipeline and save/load workflow.

RobertoCorti added 12 commits March 4, 2026 10:57

feat(transformations): make BinTransformation stateful

deebfa3

Override fit() to learn bin edges from training data (n_bins mode) or store custom edges (bin_edges mode). _apply_operation() uses bin_edges_ when set, falling back to _bins for the one-shot execute() path.

refactor: rename TransformationExecutor → TransformationPipeline and …

4254f8a

…executor.py → pipeline.py

chore(notebook): update cell outputs after re-running tutorial notebook

ba744f0

All cells re-executed with the new FeatureEngineeringTransformer Pipeline and save/load workflow.

chore: ignore generated JSON artifacts from example notebooks

106ddfe

RobertoCorti merged commit 274d670 into main Mar 4, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/create feature transformer#31

Feat/create feature transformer#31
RobertoCorti merged 12 commits intomainfrom
feat/create-feature-transformer

RobertoCorti commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RobertoCorti commented Mar 4, 2026

Add FeatureEngineeringTransformer and fit/transform pipeline

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add `FeatureEngineeringTransformer` and fit/transform pipeline