Skip to content

Feat/create feature transformer#31

Merged
RobertoCorti merged 12 commits intomainfrom
feat/create-feature-transformer
Mar 4, 2026
Merged

Feat/create feature transformer#31
RobertoCorti merged 12 commits intomainfrom
feat/create-feature-transformer

Conversation

@RobertoCorti
Copy link
Copy Markdown
Owner

Add FeatureEngineeringTransformer and fit/transform pipeline

Summary

This PR introduces a proper fit/transform pattern across the transformation layer and adds FeatureEngineeringTransformer — a deterministic, sklearn-compatible transformer for production use. It cleanly separates the LLM exploration phase (LLMFeatureEngineer) from the production inference phase (FeatureEngineeringTransformer).

Core changes:

  • BaseTransformation: adds fit(), transform(), and fit_transform() to the base ABC; removes the legacy execute() method
  • BinTransformation: made stateful — fit() learns bin edges from training data and transform() reuses them on test/inference data, preventing data leakage
  • TransformationPipeline (renamed from TransformationExecutor): gains fit() and transform() methods; module renamed pipeline.py
  • FeatureEngineeringTransformer (new): a BaseEstimator/TransformerMixin that holds transformation configs and is fully compatible with sklearn Pipeline, GridSearchCV, clone(), and joblib. Supports save()/load() via JSON
  • LLMFeatureEngineer: adds to_transformer() to export selected features to a FeatureEngineeringTransformer; removes save()/load() (serialization now belongs to the production transformer)

Docs and examples:

  • User guide, API reference, and examples updated to document the two-phase workflow and FeatureEngineeringTransformer
  • Tutorial notebook updated with a new "Production Pipeline" section showing to_transformer()Pipelinesave()/load()

Test plan

  • All 151 existing tests pass (poetry run pytest)
  • New tests cover: BaseTransformation fit/transform interface, stateful BinTransformation (fit stores edges, transform reuses them), FeatureEngineeringTransformer sklearn compatibility (get_params, set_params, clone, NotFittedError, Pipeline), save/load roundtrip, and to_transformer() on LLMFeatureEngineer
  • Notebook re-executed end-to-end with fresh outputs

- Add fit(), transform(), fit_transform() to BaseTransformation; make
  transform() abstract and execute() a concrete backwards-compat alias
  that delegates to fit_transform()
- Rename execute() → transform() in UnaryTransformation and
  BinaryArithmeticTransformation so they satisfy the new abstract method
Override fit() to learn bin edges from training data (n_bins mode) or
store custom edges (bin_edges mode). _apply_operation() uses bin_edges_
when set, falling back to _bins for the one-shot execute() path.
- fit(): iterates transformations calling t.fit(df), same raise_on_error
  handling as execute()
- transform(): iterates calling t.transform(df), builds result DataFrame
  the same way execute() does
- execute() unchanged for backwards compatibility
- FeatureTransformer(BaseEstimator, TransformerMixin) with explicit
  __init__ params (transformations, feature_prefix, raise_on_error)
- fit() builds and fits a TransformationExecutor, stores feature_names_in_
- transform() delegates to executor_.transform() after check_is_fitted()
- get_feature_names_out() returns original + generated column names
- save()/load() as straightforward JSON I/O of constructor params
- Export FeatureTransformer from skfeaturellm.__init__
…it_transform

- Add to_transformer(features=None): builds a FeatureTransformer from
  successfully generated features, with optional name filtering (accepts
  names with or without the feature_prefix)
- Remove save() and load(): serialization now belongs to FeatureTransformer
- Remove fit_transform() override: the inherited TransformerMixin.fit_transform
  correctly passes y and **fit_params through to fit()
- Remove corresponding save/load tests from test_feature_engineer.py
…reTransformer, to_transformer

- test_executor.py: add executor.fit()/transform() tests including error
  handling for missing columns (raise and warn paths)
- test_unary.py, test_binary.py: add fit(df).transform(df) == execute(df)
  parity tests and fit() returns-self tests
- test_binning.py: add stateful fit/transform tests — bin_edges_ None before
  fit, n_bins and custom edges stored at fit time, transform reuses fitted
  edges without recomputing, execute() legacy path still works
- test_feature_transformer.py: new file covering fit/transform, NotFittedError,
  get_params/set_params, clone, get_feature_names_out, Pipeline use,
  save/load JSON roundtrip
- test_feature_engineer.py: add to_transformer() tests — raises before fit,
  returns FeatureTransformer with correct configs, filters by prefixed and
  unprefixed feature names
- Remove BaseTransformation.execute() — fit_transform() is the canonical
  single-call interface
- Remove TransformationExecutor.execute() — callers use fit(df).transform(df)
- Update feature_engineer.py to use executor.fit(X).transform(X)
- Replace all t.execute(df) → t.fit_transform(df) in tests
- Replace all executor.execute(df) → executor.fit(df).transform(df) in tests
- Convert "matches_execute" parity tests into standalone fit/transform tests
Renamed class, module (feature_transformer.py → feature_engineering_transformer.py),
and test file (test_feature_transformer.py → test_feature_engineering_transformer.py).
Updated all import paths and references across the codebase.
…gTransformer

- user_guide.rst: replace Save/Load section with Production Pipeline section
  showing to_transformer(), Pipeline, and FeatureEngineeringTransformer.save/load
- api_reference.rst: add automodule entry for feature_engineering_transformer
- examples.rst: replace old engineer.save/load with FeatureEngineeringTransformer
  workflow; add Production Pipeline example
- notebook: update imports, overview, Saving section, and add Pipeline + save/load
  cells using FeatureEngineeringTransformer
All cells re-executed with the new FeatureEngineeringTransformer Pipeline
and save/load workflow.
@RobertoCorti RobertoCorti merged commit 274d670 into main Mar 4, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant