Skip to content

Phase 5: API code quality - ModelInspector, trainer mixins, registry modernization#4091

Open
w4nderlust wants to merge 6 commits intodata-pipeline-hyperopt-modernizationfrom
api-code-quality
Open

Phase 5: API code quality - ModelInspector, trainer mixins, registry modernization#4091
w4nderlust wants to merge 6 commits intodata-pipeline-hyperopt-modernizationfrom
api-code-quality

Conversation

@w4nderlust
Copy link
Copy Markdown
Collaborator

Summary

Phase 5 of the Ludwig modernization: break up god objects and improve code quality.

1. ModelInspector

Extracts model introspection from the 2400-line LudwigModel into a focused class:

from ludwig.model_inspector import ModelInspector

inspector = ModelInspector(model, config, metadata)
weights = inspector.collect_weights(['linear1.weight'])
summary = inspector.model_summary()
importance = inspector.feature_importance_proxy()

Provides: weight collection, model summary (param counts, layer types, model size), and feature importance estimation.

2. Trainer Mixins

Composable mixins for cross-cutting training concerns:

  • CheckpointMixin: checkpoint save/restore decision logic
  • EarlyStoppingMixin: early stopping based on validation metrics
  • MetricsMixin: metric formatting and logging
  • BatchSizeTuningMixin: automatic batch size search
  • ProfilingMixin: wall-clock timing for training operations

3. Registry Modernization

Added to the existing Registry class:

  • unregister(name): remove registered items (useful for testing)
  • get_default(): get the default-registered item
  • list_registered(): list all names excluding default key aliases
  • Improved docstrings and type annotations

Test plan

  • 14 new tests for ModelInspector and Registry
  • 1155 existing tests pass (0 regressions)
  • Pre-commit all clean
  • CI

@w4nderlust w4nderlust force-pushed the api-code-quality branch 3 times, most recently from f82345c to 106a30b Compare April 6, 2026 07:17
w4nderlust and others added 6 commits April 6, 2026 19:45
ModelInspector: extracts weight collection, model summary, and feature
importance estimation from LudwigModel god object.

Trainer mixins: CheckpointMixin, EarlyStoppingMixin, MetricsMixin,
BatchSizeTuningMixin, ProfilingMixin for composable training behavior.

Registry: add unregister(), get_default(), list_registered() methods
and improved docstrings for type-safe, testable registries.
- ludwig inspect: CLI command to view model summary, weights, and
  approximate feature importance from a saved model
- Tests for all 5 trainer mixins (checkpoint, early stopping, metrics,
  batch size tuning, profiling)
- Tests for training report generation and model card generation
Remove __iter__ from TrainingStats (was never used for unpacking).
Add deprecation warnings to __iter__/__getitem__ on TrainingResults
and PreprocessedDataset -- existing code keeps working but emits
DeprecationWarning. Internal code updated to use attribute access.
Trainer now inherits from CheckpointMixin, EarlyStoppingMixin,
MetricsMixin, and ProfilingMixin. This makes the mixin utility
methods (should_checkpoint, should_early_stop, format_metrics,
start_timer/stop_timer) available on the Trainer instance for
gradual refactoring of inline logic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant