Skip to content

Cleanup: dependency groups, FT-Transformer as default combiner for 3+ features#4093

Open
w4nderlust wants to merge 6 commits intomainfrom
cleanup-phases-0-3
Open

Cleanup: dependency groups, FT-Transformer as default combiner for 3+ features#4093
w4nderlust wants to merge 6 commits intomainfrom
cleanup-phases-0-3

Conversation

@w4nderlust
Copy link
Copy Markdown
Collaborator

@w4nderlust w4nderlust commented Apr 5, 2026

Summary

Cleanup PR addressing all remaining skipped items from Phases 0-3.

Phase 0: Build System

  • dependency-groups: Added `[dependency-groups]` for dev and docs in pyproject.toml
  • uv CI migration: All CI workflows now use `uv pip install --system` instead of bare pip (10-100x faster)

Phase 1: ECD

  • FT-Transformer default: Default combiner is now ft_transformer for 3+ input features (won both benchmarks)

Phase 2: Training

  • Remove DDP/FSDP/DeepSpeed: Deleted strategy files. All distributed training uses AccelerateStrategy. Legacy names alias to accelerate.
  • DictWrapper simplified: DeepSpeed workaround no longer needed.
  • Ray SafeTensors: Model weights in Ray worker checkpoints now saved via SafeTensors instead of pickle. Metadata saved as JSON. Eval results saved as JSON where possible. Legacy pickle format still loadable for backward compat.

Phase 3: Serving

  • Request body logging: Optional via LUDWIG_LOG_REQUEST_BODY=1, with truncation and request reconstruction.

Remaining torch.save/load (justified)

  • checkpoint_utils: optimizer/scheduler state (non-tensor data like momentum, step counters)
  • ecd.py: legacy fallback for old pickle checkpoints
  • text_encoders: HF checkpoint loading with weights_only=True
  • ray.py: training metrics (non-tensor), legacy format fallback

All model weight serialization now uses SafeTensors.

Test plan

  • 204 tests pass (0 regressions)
  • Pre-commit clean
  • CI

- Add [dependency-groups] for dev and docs in pyproject.toml
- Default combiner to ft_transformer when 3+ input features
  (wins on both Adult Census AUC and California Housing RMSE)
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

Test Results

   10 files  ±0     10 suites  ±0   1h 49m 50s ⏱️ + 3m 2s
3 619 tests ±0  3 589 ✅ ±0  30 💤 ±0  0 ❌ ±0 
3 708 runs  ±0  3 665 ✅ ±0  43 💤 ±0  0 ❌ ±0 

Results for commit 93fb894. ± Comparison against base commit fb963ad.

♻️ This comment has been updated with latest results.

- CI: migrate all workflows from pip to uv (10-100x faster installs)
- Distributed: remove DDPStrategy, FSDPStrategy, DeepSpeedStrategy,
  DeepSpeedBackend. All distributed training now uses AccelerateStrategy.
  Legacy strategy names (ddp, fsdp, deepspeed) alias to accelerate.
- LLM: simplify DictWrapper docstring (DeepSpeed workaround no longer needed)
- Serving: add optional request body logging (LUDWIG_LOG_REQUEST_BODY=1)
  with body truncation and request reconstruction for downstream handlers
…ults

Split Ray worker checkpoint saves:
- train_fn: model weights saved via SafeTensors (secure, no pickle),
  metadata (validation_field, validation_metric) saved as JSON,
  remaining results via torch.save
- eval_fn: eval results saved as JSON where possible, falls back to
  torch.save for complex objects
- Both load paths support legacy torch.save format for backward compat

Eliminates pickle-based serialization for model weights in Ray workers.
…nfig helper

Rename LudwigSchemaField -> SchemaField (cleaner name, no marshmallow reference).
Rename all MarshmallowField subclasses to ConfigField equivalents:
  DefaultMarshmallowField -> DefaultConfigField
  SchedulerMarshmallowField -> SchedulerConfigField
  SearchAlgorithmMarshmallowField -> SearchAlgorithmConfigField
  ExecutorMarshmallowField -> ExecutorConfigField
  GradientClippingMarshmallowField -> GradientClippingConfigField
  ProfilingMarshmallowField -> ProfilingConfigField
  LRSchedulerMarshmallowField -> LRSchedulerConfigField
  AugmentationContainerMarshmallowField -> AugmentationContainerConfigField
  PreprocessingMarshmallowField -> PreprocessingConfigField

Add SchemaField.deserialize_config() helper that centralizes the common
isinstance(value, dict) dispatch pattern used by all 13 subclasses.
Backward compat alias LudwigSchemaField = SchemaField kept.
… 3.12)

uv is stricter about building from source and GPy 1.9.9 (transitive dep
from HEBO) fails to compile on Python 3.12 (removed longintrepr.h).

Keep uv for torch install (binary-only, 10x faster) but use pip for
the test extras which include GPy via HEBO.
comet_ml internally imports the imp module which was removed in Python 3.12.
Skip the test until comet_ml releases a compatible version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant