Release DDP and Checkpoint bug fixes · Lightning-AI/pytorch-lightning

Overview

As we continue to strengthen the codebase with more tests, we’re finally getting rid of annoying bugs that have been around for a bit now. Mostly around the inconsistent checkpoint and early stopping behaviour (amazing work @awaelchli @jeremyjordan )

Noteworthy changes:

Fixed TPU flag parsing
fixed average_precision metric
all the checkpoint issues should be gone now (including backward support for old checkpoints)
DDP + loggers should be fixed

Detail changes

Added

Added TorchText support for moving data to GPU (#2379)

Changed

Changed epoch indexing from 0 instead of 1 (#2289)
Refactor Model backward (#2276)
Refactored training_batch + tests to verify correctness (#2327, #2328)
Refactored training loop (#2336)
Made optimization steps for hooks (#2363)
Changed default apex level to 'O2' (#2362)

Removed

Moved TrainsLogger to Bolts (#2384)

Fixed

Fixed parsing TPU arguments and TPU tests (#2094)
Fixed number batches in case of multiple dataloaders and limit_{*}_batches (#1920, #2226)
Fixed an issue with forward hooks not being removed after model summary (#2298)
Fix for load_from_checkpoint() not working with absolute path on Windows (#2294)
Fixed an issue how _has_len handles NotImplementedError e.g. raised by torchtext.data.Iterator (#2293), (#2307)
Fixed average_precision metric (#2319)
Fixed ROC metric for CUDA tensors (#2304)
Fixed average_precision metric (#2319)
Fixed lost compatibility with custom datatypes implementing .to (#2335)
Fixed loading model with kwargs (#2387)
Fixed sum(0) for trainer.num_val_batches (#2268)
Fixed checking if the parameters are a DictConfig Object (#2216)
Fixed SLURM weights saving (#2341)
Fixed swaps LR scheduler order (#2356)
Fixed adding tensorboard hparams logging test (#2342)
Fixed use model ref for tear down (#2360)
Fixed logger crash on DDP (#2388)
Fixed several issues with early stopping and checkpoint callbacks (#1504, #2391)
Fixed loading past checkpoints from v0.7.x (#2405)
Fixed loading model without arguments (#2403)

Contributors

@airium, @awaelchli, @Borda, @elias-ramzi, @jeremyjordan, @lezwon, @mateuszpieniak, @mmiakashs, @pwl, @rohitgr7, @ssakhavi, @thschaaf, @tridao, @williamFalcon

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DDP and Checkpoint bug fixes

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Overview

Noteworthy changes:

Detail changes

Added

Changed

Removed

Fixed

Contributors

Uh oh!