Skip to content

Conversation

@KeitaW
Copy link

@KeitaW KeitaW commented Sep 27, 2025

This PR is aiming to address #482 by adding DummyOptimizerWithStateDict class that inherits from torch.optim.Optimizer to resolve the AttributeError that occurs when DeepSpeed tries to save checkpoints during weight conversion.

The built-in DummyOptim created when passing optimizer=None to deepspeed.initialize() lacks a state_dict() method. Additionally, DeepSpeed validates that the optimizer is an instance of expected types (Optimizer, None, or Callable).

This fix provides a custom optimizer that:

  • Inherits from torch.optim.Optimizer to pass DeepSpeed's type check
  • Implements required methods: step(), state_dict(), and load_state_dict()
  • Returns empty state since optimizer state is not needed during conversion

Add DummyOptimizerWithStateDict class that inherits from torch.optim.Optimizer
to resolve the AttributeError that occurs when DeepSpeed tries to save
checkpoints during weight conversion.

The built-in DummyOptim created when passing optimizer=None to
deepspeed.initialize() lacks a state_dict() method. Additionally,
DeepSpeed validates that the optimizer is an instance of expected types
(Optimizer, None, or Callable).

This fix provides a custom optimizer that:
- Inherits from torch.optim.Optimizer to pass DeepSpeed's type check
- Implements required methods: step(), state_dict(), and load_state_dict()
- Returns empty state since optimizer state is not needed during conversion

Co-authored-by: aravneelaws <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant