Skip to content

Fix/improve docs#30

Merged
RobertoCorti merged 3 commits intomainfrom
fix/improve-docs
Mar 3, 2026
Merged

Fix/improve docs#30
RobertoCorti merged 3 commits intomainfrom
fix/improve-docs

Conversation

@RobertoCorti
Copy link
Copy Markdown
Owner

Summary

  • Rewrote the tutorial notebook (examples/01_SKFeatureLLM_Tutorial.ipynb): replaced the California Housing example with a credit risk classification task using the Bank Loan dataset loaded via kagglehub. Switched from RandomForestClassifier to XGBClassifier, added a per-feature evaluation loop that measures ROC AUC delta for each engineered feature, a feature selection step (keep only features with positive delta), and a save() / load() section demonstrating engineer persistence.
  • Updated user_guide.rst: added a Dataset Statistics section documenting the y parameter in fit() and the stats injected into the LLM prompt; added a Saving and Reusing section with save() / load(); added the bin transformation to the supported-transformations list; added a data-leakage warning (fit on training data only).
  • Updated examples.rst: both classification and regression examples now include a train/test split and pass y to fit(); expanded the Feature Evaluation section to show per-feature filtering using mutual information scores; added Saving and Loading and Notebook Tutorial sections.
  • Updated get_started.rst: quickstart examples now include a train/test split, pass y=y_train to fit(), and transform train and test independently; added a data-leakage callout.

… and XGBoost

- Replace California Housing example with Bank Loan credit risk dataset
  loaded via kagglehub
- Switch from RandomForestClassifier to XGBClassifier with
  enable_categorical support
- Add per-feature evaluation loop (ROC AUC delta per engineered feature)
- Add feature selection step (keep only features with positive delta)
- Add save/load section demonstrating engineer persistence
- Add Dataset Statistics section to user_guide.rst documenting
  the y parameter in fit() and what stats are injected into the prompt
- Add Saving and Reusing section to user_guide.rst with save()/load()
- Add bin transformation to the supported transformations list
- Add data leakage warning (fit on training data only)
- Update all quickstart examples to include train/test split and pass y
- Expand Feature Evaluation section in examples.rst to show per-feature
  selection pattern using mutual information scores
- Add Saving and Loading section and Notebook Tutorial link to examples.rst
@RobertoCorti RobertoCorti merged commit c7ee89e into main Mar 3, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant